PCT 



WORLD INTELLECTUAL PROPERTY ORGANIZATION 
International Bureau 




INTERNATIONAL APPUCATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(51) International Patent Classification ^ : 

C12N 15/12, 15/63, 15/81, C12Q 1/68, 
C07K 14/47, GOIN 33/53 



Al 



(11) InternaUonal PublicaUon Number: WO 96/12016 

(43) Internationa! Publication Date: 25 April 1996 (25.04.96) 



(21) lotemational Application Number: PCT/CA95/005S1 

(22) International Filing Date: 17 October 1995 (17.10.95) 



(30) Priority Data: 
9421019.2 
2,138.425 



18 October 1994 (18.10.94) GB 

19 December 1994 (19,12.94) CA 



(71) Applicants (Jot all designated States except US): THE UNI- 

VERSITY OF OTTAWA [CA/CA]; 550 Cumberland Street, 
Ottawa, Ontario KIN 6N5 (CA). RESEARCH DEVELOP- 
MENT CORPORATION OF JAPAN [JP/JP]; Kawaguchi 
Centre Building, 4-1-8, Honcho, Kawaguchi-shi, Saitama 
332 (JP). 

(72) Inventors; and 

(75) Inventors/Applicants (for US only): MacKENZIE, Alex, £. 
[CA/CA]; 35 Rockliffe Way. Ottawa, Ontario KiN 1A3 
(CA). KORNELUK, Robert. G. [CA/CA]; 1901 Tweed 
Avenue. Ottawa. Ontario KIG 2L8 (CA). MAHADEVAN. 
Mani, S. [CA/CA]; Apartment 4, 818 South GrammonRoad. 
Madison. Wl 53719 (US). McLEAN, Michael [CA/CA]; 1 
Halesmanor Court. Guelph, Ontario NIG 4E1 (CA). ROY. 
Natalie [CA/CA]; 6 McLeod Stxeet. Ottawa. Ontario K2P 
0Z5 (CA). IKEDA. Joh-E [JP/JP]; 31-1. Kamineguro 5- 
chome, Meguro-ku, Tokyo 153 (JP). 



(74) Agent: WOODLEY, John, H.; Sim & McBumey, Suite 701. 
330 Univeisity Avenue. Toronto. Ontario M5G 1R7 (CA). 



(81) Designated States: AL. AM. AT, AU, BB, BG, BR, BY. CH, 
CN, CZ, DE, DK. EE, ES, FI, GB, GE, HU. IS, JP. KE, 
KG, KP, KR, KZ, LK, LR, LT. LU. LV, MD, MG, MK, 
MN, MW, MX, NO, NZ. PL, PT» RO. RU, SD, SE SG, SI, 
SK, TJ, TM, rr, UA, UG. US, UZ, VN, European patent 
(AT, BE, CH, DE, DK, ES. FR, GB, GR, IE. IT, LU, MC, 
NL, PT, SE). OAPI patent (BF. BJ, CF, CG. CI. CM, GA, 
GN, ML, MR. NE, SN, TD. TG). ARIPO patent (KE, MW, 
SD. SZ. UG). 



Published 

With imernational search report. 

Before the expiration of the time limit for amending the 
claims and to he republished in the event of the receipt of 
amendments. 



(54) Titie: 



NEURONAL APOPTOSIS INHIBITOR PROTEIN. GENE SEQUENCE AND MUTATIONS CAUSATIVE OF SPINAL 
MUSCULAR ATROPHY 




(57) Abstract 

The gene for the autosomal recessive neurodegenerative disorder Spinal Muscular Atrophy has been mapped to a region of chromosome 
5. The gene encodes a protein having homology with apoptosis inhibitor proteins of viruses so that the encoded protein has been labelled 
as a neuronal apoptosis inhibitor protein (NAIP). A deletion in the (NAIP) domain was identified in persons with Type I, n and III Spinal 
Muscular Atrophy (SMA) and not in the nonnal non-SMA population. 



FOR THE PURPOSES OF INFORMATION ONLY 



Codes used to identify States paity to the PCT on the front pages of pan^hlets publishing international 
applications under the PCT. 



AT 


Austria 


GB 


United Kingdom 


MR 


Mauritania 


AU 


AusoBlia 


GE 


Georgia 


MW 


Malawi 


BB 


Barbados 


GN 


Guinea 


NE 


Niger 


BE 


Belgium 


GR 


Greece 


NL 


Netherlands 


BF 


BuriduFiso 


HU 


Hungary 


NO 


Norway 


BG 


Bulgaria 


IE 


Ireland 


NZ 


New Zealand 


BJ 


Benin 


IT 


Italy 


PL 


Poland 


BR 


Brazil 


JP 


Japan 


PT 


Portugal 


BY 


Belarus 


KE 


Kenya 


RO 


Romania 


CA 


Canada 


KG 


Kyrgystan 


RU 


Russian Federation 


CF 


Central African Republic 


KP 


Democratic People's Rqwiblic 


SD 


Sudan 


CG 


Congo 




of Korea 


SB 


Swedai 


CH 


Swiueriand 


KR 


Rqwblic of Korea 
Kazakhsum 


SI 


Slovenia 


a 


COied'lvoiie 


KZ 


SK 


SkivaUa 


CM 


CuncitNiD 


U 




SN 


Senegal 


CN 


China 


LK 


Sri Lanka 


TD 


Chad 


CS 


Czechoslovakia 


W 


tuxembouig 


TG 


Togo 


CZ 


Czech Republic 


LV 


Latvia 


TJ 


Tajikistan 


DE 


Gemuny 


MC 


Monaco 


TT 


Trinidad and Tob^o 


DK 


Denmaik 


MD 


Republic of Moldova 


UA 


Ukraine 


ES 


Spain 


MG 


Madagascar 


US 


United Slates of America 


FI 


Fmtand 


ML 


Mali 


UZ 


Uzbekisun 


FR 


Fiauce 


MN 


Mongolia 


VN 


Viet Nam 


GA 


OaboD 











wo 96/12016 



PCT/CA95/00581 



MSURONAL AP0PT08Z8 ZMBIBZTOR PROTEIN, GENE SEQUENCE 
^MP MtrP^TIQMfl CAUSATIVE OP SPINAL MPfiCni.AR ATROPHY 

FTTiTiP INVENTION 
5 The gene for the neuronal apoptosis inhibitor 

protein (NAIF) has been identified in the ql3 region of 
chronosoxne 5. Mutations in this gene have been diagnosed 
in individuals with Type I, II and III Spinal Muscular 
Atrophy. The amino acid sequence of the neuronal 
10 apoptosis inhibitor protein is provided and homology to 
viral apoptosis proteins demonstrated. 

BACKGROUND OF THE INVENTION 

In order to facilitate reference to various journal 
articles in the discussion of various aspects of this 

15 invention, a complete listing of the reference is 

provided at the end of the disclosure. Otherwise the 
references are identified in the disclosure by first 
author's name and publication year of the reference. 

The childhood spinal muscular atrophies (SMAs) are a 

20 group of autosomal recessive, neurodegenerative disorders 
classified into three types based upon the age of onset 
and clinical progression (Dubowitz et al., 1978; Dubowitz 
et al., 1991). All three types are characterized by the 
degeneration of the alpha motor neurons of the spinal 

25 cord manifesting as weakness and wasting of the proximal 
voluntary muscles. Type I SMA is the most severe form 
with onset either in utero or within the first few months 
of life. Affected children are unable to sit unsupported 
and are prone to recurrent chest infections due to 

30 respiratory insufficiency, thus rarely surviving the 

first few years of life (Dubowitz et al., 1978; Dubowitz 

_ et al., 1991). This acute form, with a carrier frequency 
of 1/60 to 1/80, is one of the most frequent fatal 
autosomal recessive disorders. Affected children with 

35 Type II SMA never walk unaided and although the prognosis 
is variable, such children may die in adolescence. Those 
affected with Type III SMA maintain independent 
ambulation but develop weakness any time between the age 
of 3 to 17 years manifesting a mildly progressive course 
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(Dubovitz et al., 1978; Dubowitz et al., 1991). 

In 1990, all three childhood forms of SMA were 
genetically mapped to the long arm of chromosome 5 at 
5qll.2 - 13.3 (Brustowitcz et al., 1990; Gilliam el al., 
5 1990; Nelki el al., 1990). Subsequent multi-point 
linkage analyses and the identification of recombinant 
events have further localized the genetic defect to the 
region flanked centromerically by D5S435/D5S629 (Scares 
et al., 1993; Wirth et al., 1993, Clermont et al., 1994)) 

10 and telomerically by NAP1B/D5S112 (Wirth et al., 1994; 
MacKenzie et al., 1993; Lien et al., 1991). This 
interval has been refined by the more recent 
identification of recombination events indicating that 
the SMA gene lies distal to CMS-1 (Yaraghi et al. , 

15 submitted to Human Genetics; van der Steege, et al., 
submitted to Human Genetics) and proximal to D5S557 
(Francis et al., 1993). We and others have detected 
chromosome 5-specific repetitive sequences with 
particular abundance in the D5S629/CHS-D5S557 region 

20 (Francis et al., 1993; Thompson et al., 1993) which has 
impeded the isolation and ordering of both clones and 
simple tandem repeats. An array of cosmid clones 
spanning the 200 kb CMS-1 (Kleyri et al., 1993)/CATT-1 
(Burghes et al., 1994, McLean et al., in 

25 press) /D5ri50/D5F149/D5ri53 (Melki et al., 1994) region 
within this interval has been constructed. 

We established a contiguous array of YAC clones 
encompassing the SHA containing D5S435 - D5S112 interval 
of 5ql3«l. We then discovered a gene within this 

30 interval of 5ql3.1 which coded for a neuronal apoptosis 
inhibitor protein (NAIP) • Further studies demonstrated 
that a deletion in this gene was found in Type I, II and 
III Spinal Muscular Atrophy. 

SUMMARY OF THE INVENTION 

35 A gene encoding a neuronal apoptosis inhibitor 

protein (NAIP) was discovered in the ql3 region of human 
chromosome. According to an aspect of the invention, the 
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cDNA sequence coding of the neuronal apoptosis inhibitor 
protein is provided and set out in Table 4 . According to 
another aspect of the invention, the predicted amino acid 
sequence of the neuronal apoptosis inhibitor protein is 
5 provided from the cDNA sequence. 

According to another aspect of the invention, a 
deletion of the neuronal apoptosis inhibitor protein gene 
was discovered in persons with Type I, II and III Spinal 
Muscular Atrophy disease. The discovery of the neuronal 

10 apoptosis inhibitor protein gene deletion provides a 
diagnostic indicator for use in the diagnosis of Spinal 
Muscular Atrophy. 

In order to facilitate a further description of 
various aspects of the invention, reference will be made 

15 to various Figures of the drawings. A brief description 
of the drawings follows this invention summary section. 

According to a further aspect of the invention, a 
human gene is provided which maps to the SKA containing 
region of chromosome 5ql3. The gene comprises exons 1 

20 through 17 of approximately 5,5 kb and having a 

restriction map for exons 2 through 11, as shown in 
Figure 8. 

According to a further aspect of the_ invention, 
exons 1 through 17 have a restriction map for exons 2 

25 through 16, as shown in Figure 9D. 

According to another aspect of the invention, a 
human gene of the above aspects wherein exons 5 through 
16 code for the NAIF protein having an amino acid 
sequence biologically functionally equivalent to the 

30 amino acid sequence of Sequence ID No. 2. 

According to another aspect of the invention, the 
human gene of the above aspects have exons 5 through 16 
with a CDNA sequence biologically functionally equivalent 
to the cDNA sequence of Sequence ID No. 1. 

35 According to another aspect of the invention, a 

purified nucleotide sequence comprises genetic DNA, cDNA, 
mRNA, anti-sense DNA or homologous DNA corresponding to 
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the cDNA sequence of Sequence ID No. 1. 

According to another aspect of the invention, a DNA 
molecule sequence coding for the NAIP protein having 
sequence ID No. 2. 
5 According to another aspect of the invention, a 

purified DNA sequence consists essentially of DNA 
Sequence ID No. 1. 

According to another aspect of the invention, a 
purified DNA sequence consists essentially of a DNA 
10 sequence coding for amino acid Sequence ID No. 2. 

According to another aspect of the invention, a 
purified DNA sequence comprises at least 18 sequential 
base of Sequence ID No. 1. DNA probes, PGR primers, DNA 
hybridization molecules and the like may be provided by 
15 using the purified DNA sequence of at least 18 sequential 
bases. 

According to another aspect of the invention, use of 
the DNA sequences of the above-aspects in the 
construction of a cloning vector or an expression vector. 
20 According to another aspect of the invention, NAIP 

protein encoded by the above DNA sequences. 

According to another aspect of the invention, NAIP 
protein comprising an amino acid sequence biologically 
equivalent to the amino acid sequence of Sequence ID No. 
25 2. 

According to another aspect of the invention, NAIF 
protein consisting essentially of the amino acid sequence 
of Sequence ID No. 2. 

According to another aspect of the invention, NAIP 
30 protein fragment comprises at least 15 sequential amino 
acids of Sequence ID No. 2. 

According to another aspect of the invention, use of 
the above amino acid sequences in the production of 
hybridomas. 

35 According to another aspect of the invention, a 

method is provided for analyzing a biological sample to 
determine the presence or absence of a gene encoding NAIP 
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protein. 

The method comprises: 

i) providing a biological sample derived from the 
SMA containing region ql3 of chromosome 5; 
5 ii) conducting a biological assay to determine 
presence or absence in the biological sample of at least 
a member selected from the group consisting of: 

a) NAIF DNA sequence ID No. 1, and 

b) NAIP protein Sequence ID No. 2. 

10 DESCRIPTION OF DRAWINGS 

The original numbering of exons for the NAIP gene 
begin with exon 0 and progressed through exon 16. This 
is identified in drawings as sequence numbering Scheme 
#1. However, for conventional exon numbering, it is 

15 preferable to begin with exon 1 and progress through to 
exon 17. This is now identified as sequence nimbering 
Scheme #2* 

Figure 1: YAC contiguous assay of the SMA gene 
region. YACs are represented by solid lines. Open 

20 triangles represent polymorphic STRS, solid triangles 
represent STSS, open squares represent single copy 
probes. The genetically defined SMA interval, CMS-l-SMA- 
D5S557 and the previous D5S629-SMA-D5S557 interval, are 
indicated above the YACS. 

25 Figure 2: Long range restriction map of the SMA 

region. Rare cutter sites are indicated above the solid 
line. A minimal set of markers are indicated below the 
solid line t corresponds to the pYAC4 tryptophan or left 
end. u corresponds to the pYAC4 uracil or right end. The 

30 genetically defined CMS-1-SMA-DSS557 and the D5S629-SMA- 
D5S557 interval are estimated at 550 Kb and 1.1 Mb 
respectively. 

Figure 3: Amplification of the CATT- I locus. 
Allele sizes are shown below each lane. (A) 

35 Amplification of YACS. G: genomic DNA. (B) Amplification 
of cosmids derived from the chromosome 5 flow sorted 
library.. The 4 distinct alleles are represented by 
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cosmids 40G1 (allele 15), 58G12 (allele 12), 192F7 
(allele 10) and 250B6 (allele 7). 

Figure 4: A representative subset of mapped cosmids 
from our contiguous array. Vertical lines above the 
5 solid line are the positions of EcoRI sites. Open 
triangles represent polymorphic STRS, filled triangles 
represent STSS, filled squares represent single copy 
probes and open squares represent transcribed sequences. 
The STRs which demonstrate strong linkage disequilibrium 

10 with Type I SHA are indicated by stars. Cosmids 163 and 
IB9 are from the YAC 76CI cosmid library. 

Figure 5: Sequence duplication in the SMA region 
identified by pl51.2. Hybridization of YACs with (A) the 
700 bp fragment and (C) the 500 bp fragment. YACs are 

15 arranged from left to right, centromeric to telomeric. 

Hybridization of cosmids with (B) the 700 bp fragment and . 
(D) the 500 bp fragment. (B) The 12 kb fragment is 
detected in the cosmids however the 20 kb fragment is not 
present. The 2.5 kb and 600 bp fragments detected in 3B3 

20 and lEI respectively are end fragments. (D) Only the 3 kb 
fragment is detected in the cosmids. Note the absence of 
the 20 kb band in 24D6 in (A) but its presence in (C)". 
The 700 bp fragment may be deleted in 24D6. 

Figure 6: Degree of linkage disequilibrium observed 

25 between Type I SMA and various polymorphic 5ql3.l markers 
giving a disequilibrium peak at 40G1. 

Figure 7: A PAC contiguous array containing the 
CATT region comprised of nine clones and extending 
approximately 400 kb. The 2.2 kb transcript referred to 

30 as GAl is shown. 

Figure 8: Structural organization of the SMA gene. 
The exons are represented by black boxes and numbered 
above. The positions of restriction sites are shown: B, 
BamHI; E, EcoRI; N, NotI, Exons 4 and 5 (Scheme #1) or 

35 Exons 5 and 6 of Scheme #2 are frequently deleted in all 
types of SMA. 
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Figure 9 is a single page alignment of the 
information of Figures 6, 1, 7 and 8, respectively. 
Figure 9(A) is a correlation of the degree of linkage 
disequilibrium observed in type I SMA families between 
5 the disease phenotype and six 5ql3.1 markers with the 
physical map. The SMA containing interval defined by the 
key recombinations described in the text is shown. Note 
the proximity of the disequilibrium peak with the 
centromeric end of the recombinant defined SHA interval. 

10 Figure 9(B) is a YAC contiguous array covering the 

SMA region of 5ql3.1. For both YAC and PAC contigs, STSs 
are denoted by solid triangles, polymorphic tandem repeat 
polymorphisms by empty triangles, single copy clones by 
solid squares. Note that our physical map places the CMS 

15 sub locus containing allele 9, marked with an asterisk 
telomeric to the other CMS subloci, while the reverse was 
observed with genetic recombination data, reflecting, we 
believe, the variation that exists in this region of 
5ql3.1. 

20 Figure 9(C) is a PAC contiguous array covering the 

SMA region of 5ql3.1. 

Figure 9(D) is the gene structure of NAIP as 
provided in more detail in Figure 8. 

Figure 10: Exon content of PAC, fetal brain cDNA 

25 clones from non-SMA individuals and RT-*PCR clones from 
SMA affected individuals. E158 refers to the deletion of 
a glutamate residue. The RT-PCR products was only 
performed between exons 13 and 4 (Scheme #2); additional, 
undetected deletions may exist outside of this region. 

30 

Figure 11: Structure of intact and internally 
deleted/ truncated ..versions of the NAIP gene as found in 
the indicated PACs. In Figure llA, exons under Scheme #2 
are marked as numbered black boxes. N refers to NotI 
35 sites, B to BamHI and E to EcoRI sites. The EcoRV clone 
that detects the 3 and 9.4 kb EcoRI bands referred to in 
the text is denoted by EV in Figure IIB. The 4.8 kb 
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EcoRI/BamHI band deleted in Figure 14 is also depicted. 
The 6 kb region containing exons 5 and 6 (Scheme #2) and 
the 23 kb BamHI fragment resulting from this deletion are 
both shown in Figures IIC and IID. The location of 
5 primers utilized to identify deletions of exon 5 and 6 as 
well as those that identify the truncated fragment in the 
deleted NAIP gene are shown above the NAIP structure. 

Figure 12: Intron/exon splice sequences of the NAIP 

gene. 

10 Figure 13: Northern blot of adult tissues probed 

with exon 13 (Scheme #2) of the NAIP locus. Tissues are 
as marked and the filter were washed at 50 0.2X SSC 
and exposed for 4 days. Bands can seen in liver and 
placenta in the 6-7kb range. 

15 Figure 14: Pedigree and Southern blot analysis of 

consanguineous French*Canadian type III SMA families. 
Upper panel: probing -of a filter containing BamHI/EcoRI 
digested genomic DNA with a cDNA probe encompassing exons 
2 through 9 (Scheme #2) of NAIP reveals the loss of the 

20 4.8 kb fragment that contains exons 5 and 6 (Scheme #2) 
in all affected individuals resulting in an in-frame 
deletion. All others, save for the homozygous normal 
sister and brother show half dosage for this band. The 
lower panel shows a BamHI digest of the same family. In 

25 affected individuals two superimposed 14.5 kb contiguous 
fragments have sustained the 6 kb deletion of sequence 
containing a BamHI site resulting in the generation of a 
23 kb band (see Figure 11) • Note the existence of the 23 
kb BamHI band in all individuals in the pedigree in 

30 keeping with its general dispersion in the population* 
similarly, the 9.6 kb BamHI band representing the 
deletion of exons 1 through 6 (Scheme #2) which is 
contained in PAC 238D12 and depicted in Figure 11 can be 
seen in all individuals including non-SMA carriers. 

35 Figure 15: Results of PCR amplification in type 3 

families 21470 and 24561 using primers 1864 and 1863 
which amplify exon 5 (Scheme #2) . The reactions were 
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multiplexed with exon 13 (Scheme #2) primers 1258 and 
1343 to rule out PCR failure obscuring the results. 
Failure of amplification in keeping with the homozygous 
absence of exon 5 (Scheme #2) can be seen to co-segregate 
5 with the disease phenotype. 

Figure 16: RT-PCR amplification of RNA from SHA and 
non-SNA tissues* The letter n refers to RNA from non-SMA 
tissue and a to RNA from SMA affected tissue. The tissue 
source is shown above each panel. Lym refers to 

10 lymphoblast and fib to fibroblast. All seunples were from 
type 1 SMA patients with the exception of a5 which is 
from an affected member of the consanguineous type 3 SMA 
family 24561 shown in Figure 15. 

RNA was reverse transcribed from exon 13 (Scheme 

15 #2) . Primary PCR of products shown in panels A and B was 
with exon 1 primer 1884 and exon 13 primers 1285 or 1974 
and those in panel C with exon 6 primer 1919 and exon 13 
primer 1285. Secondary PCR reactions for panel A used 
exon 4 primer 1886 and exon 13 primer 1974; for panel 

20 exon 5 primer 1864 and exon 11 primer 1979 and for panel 
C, exon 9 primer 1844 and exon 13 primer 1974. 

Failure or amplification of reduced products can be 
seen in panel A for spinal cord and lymphoblast tissue 
for samples a2 , a3 , a4 , a5 , a6 and' a7 • Panel B also 

25 shows amplification of reduced size bands in a2 and a3, 
and in a7 a larger product in keeping with an insertion. 
Panel C shows reduced band size in keeping with deletions 
of exons 11 and 12 (Scheme #2) in a2, a3, a9 and all. 
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

30 Unless indicated otherwiise, reference to exons in 

this detailed description of the invention will be based 
on exon numbering Scheme #2. 

Throughout the specification, various letter 
abbreviations will be used to identify various components 

35 or techniques. The following glossary is provided to 
reference these items. 
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CTR complex tandem repeat 

DNA - deoxyribonucleic acid 

PCR - polymerase chain reaction 

PFGE - pulsed field gel electrophoresis 

5 PAC - PI artificial chromosome 

RNA - ribonucleic acid 

RT-PCR - reverse transcriptase-polymerase chain 
reaction 

STR - simple tandem repeat 

10 STS - sequence tag site 

YAC - yeast artificial chromosome 



This invention is directed to the identification, 
location and sequence characteristics of a gene which 
encodes Neuronal Apoptosis Inhibitor Protein (NAIP) . We 

15 have established that mutations in this gene are 

causative of the previously discussed types I, II and III 
of Spinal Muscular Atrophies (SMA) . It is believed that 
mutations in this gene result in the lack in the 
production of normal NAIP protein which is believed to be 

20 physiologically involved in the normal human process of 
maintaining neurological cells and preventing their early 
death common to affected individuals. The subject gene 
maps to the SMA containing region of chromosome .5ql3.1. 
Unless indicated otherwise, reference to exons in this 

25 detailed description of the invention will be based on 
exon numbering Scheme #2. The gene comprises exons 1 
through 17 of approximately 5.5 kb and has a restriction 
map for exons 2 through 11, as shown in Figure 8. An 
updated restriction map for exons 2 through 16 is 

30 provided in Figures 9D and llA. As is appreciated, the 

.gene is considerably longer than the sequence for exons 1 

through 17. Considerable intron information exists 
between the exons which has not yet been sequenced. From 
. the standpoint of diagnosing SMA, the sequence 

35 information of exons 1 through 17 is very valuable. The 
normal sequence is provided in Table 4, as well as being 
listed under Sequence ID No. 1. Any genetic mutation. 
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that is, changes in the DNA sequence, whether they be due 
to deletion, entire absence of gene substitution or 
polymorphisms and the like, are or can be causative of 
the disease. The most common mutations are thought to 
5 be: 

i) deletion of exons 5, 6 of the gene; or 
ii) absence or narked reduction in the copy number 
of this gene in the chromosome 5 can be causative, if the 
remaining genes are defective. 

10 Any form of biological assay may be employed to 

diagnose a person's susceptibility to SMA by virtue of 
conducting a biological assay to determine the normal 
sequence or absence or presence of mutations in the 
normal sequence. Such biological assays may include DNA 

15 hybridization by use of DNA probes and the like, 

restriction enzyme analysis, PGR amplification of the 
relevant portions of the sequence, messenger HNA 
detection and DNA sequencing of the relevant portions of 
the sequence, as isolated from chromosome 5 of the human 

20 biological sample. It is appreciated that a variety of 
the above generally identified biological assay 
procedures may be conducted where the preferred 
techniques are as follows: 

SMA diagnoses will be conducted in two ways. 

25 Initially, the genome of the human at risk will be 

assayed for the absence of NAIP exons 5 and 6. These 
exons are found to be absent with a frequency of .05% in 
the general population and 50% in Type 1 SMA. The second 
approach will be to assess the number of copies of the 

30 NAIP gene in the individuals being tested. We have 

_ observed that there is a general depletion of both 

deleted and intact forms of the NAIP gene, in individuals 
with SMA. By using a densitometric approach to -assess 
the number of gene copies, an accurate assessment of the 

35 risk having SMA can be established. The best correlation 
is observed for exons 2 through 4 and exon 13. 
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In practical terms, the two steps outlined above 
will be conducted in the following manner: 

(i) two concurrent PGR reactions will be carried 
out upon the same aliquot of DNA (Oil micrograms) from 
5 the human in question. One primer pair will map into 
exons 5 and 6 (e.g. primers 1863 Sequence ID No. 7 and 
1864 Sequence ID No. 8) and one pair will be homologous 
to a region outside of exons 5 and 6 (primers 1343 
Sequence ID No. 5 and 1258 Sequence ID No. 4). The 

10 latter reaction will be performed to ensure that the PGR 
is functioning. Two additional controls will be (i) PGR 
performed on genomic DNA known to contain exons 5 and 6 
employing the appropriate primers to ensure that this 
particular reaction is working, (ii) negative controls 

15 using water as a template to ensure absence of 

contamination. All PGR products will be placed in an 
agarose gel, separated electrophoretlcally and analyzed 
visually. 

(ii) Densitometric assessment of SNA risk will be 

20 carried out by using PGR primers tagged with fluorescent 
dyes. PGR reactions employing primers for exons 2 
through 4, exons 13 as well as exons 5, 6 and exons 11, 
12 will be performed on genomic DNA from the individual 
being assessed. . PGR products will be separated 

25 electrophoretically on a gel and the intensity of the 

individual bands assessed f luorometrically. These values 
will be correlated with normative values and SMA risk 
thus ascertained. 

It is apparent that one's level of NAIP correlates 

30 with the risk for other neurodegenerative disorders such 
as amyotrophic lateral sclerosis and Alzheimers. 
GonseG[uently, the tests outlined above serve as 
predictors of risk for these disorders as well. As is 
described in more detail in the section under heading 

35 Bamloviral lAPs, the NAIP protein has significant 
homology with proteins for inhibiting cell apoptosis. 
Hence, any neurodegenerative disease which is based on 
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neuronal cell apoptosis can now be predicted by use of 
the DNA sequence Information of the NAIP gene. Such 
neuronal cell apoptosis is most likely linked to 
mutations in the NAIP gene similar to the mutations 
5 associated with SMA or other mutations in the gene which 
affect the biological activity of the NAIP protein 
inhibiting neuronal apoptosis. 

As to mRNA detection we propose the following: 
RT-PCR is a rapid technique for the analysis of RNA 

10 transcripts which is a crucial part of several molecular 
biology applications. This method is much more sensitive 
and efficient than traditional Northern blot, RNA 
dot/slot blots, and in situ hybridization assays. The 
sensitivity of such a technique allows one to study RNA 

15 transcripts of low abundance or RNA isolated from small 
amounts of cells. In addition, an entire panel of 
transcripts can be analyzed simultaneously. 

Protocol Summary; RNA is -first isolated from 
tissues or cells and then is used as a template for 

20 reverse transcription to complimentary DNA (cDNA) . The 
reverse transcription (RNA-directed synthesis of DNA) , is 
catalyzed by the enzyme reverse transcriptase. The cDNA 
is then used as the template for PCR using primers 
designed to amplify a selected cDNA region. Following 

25 PCR, the product is analyzed by agarose gel 

electrophoresis. The amplified cDNA is identified by the 
size of the PCR product which is predicted from knowledge 
of the cDNA nucleotide sequence. The PCR product can be 
further validated by restriction digestion, hybridization 

30 or nucleotide sequencing. 

Enzymatic Amplification of RNA by PCR (RT-PCR) 

This method is used to enzymatically amplify RNA 
using PCR. 

35 Detailed Protocol ; First the primer is annealed to 

the RNA. The RNA and cQNA primer are coprecipitated by 
adding together poly(A) "^RNA, cDNA primer, and water. 
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Sodixim acetate is added and ethanol. This is 
precipitated overnight over -20®C. The pellet is 
collected after microcentrifugation. The pellet is 
washed with ethanol. Then water, Tis-HCl, and KCl are 
5 added and the mixture is heated to 90*^0 and then cooled 
slowly to 67**C. Microcentrifuge and incubate 3 hours at 
52 ^C. This final annealing temperature may be adjusted 
according to base composition of primer. Alternatively, 
the primer can be annealed to the RNA by mixing 

10 poly(A)'*^RNA, cDNA primer, and water. This mixture is 
heated 3 to 15 minutes at 65 To the cooled mixture, 
add reverse transcriptase buffer. 

The cDNA is now synthesized. Add reverse 
transcriptase buffer and AMV reverse transcriptase. This 

15 is mixed and incubated 1 hour at 42 ^'C (depending on the 
base composition of primer and RNA) . Add Tris-Cl/EDTA, 
mix then buffered phenol and vortex. Microcentrifuge and 
add chloroform to the aqueous phase and vortex. 
Microcentrifuge. Add sodium acetate and ethanol to 

20 aqueous phase. Mix and precipitate overnight at -20"C. 
Microcentrifuge, dry pellet, and resuspend in water. 

The cDNA is then amplified by PGR. The mixture 
contains prepared cDNA, amplification, dNTP mix, 
amplification buffer, and water. Usually one of the 

25 amplification primers is the same as cDNA primer. If a 
different amplification primer is used, the cDNA primer 
should be removed from the cDNA reaction. The reaction 
mixture is then heated 2 minutes at 94 ^C, and 
microcentrifuged to collect condensate. Add Tag DNA 

30 polymerase, mix, centrifuge, overlay with mineral oil. 
Set up amplification cycles. The number of cycles is 
varied depending upon the abundance of RNA. Forty cycles 
are usually sufficient. The products are then analyzed by 
gel electrophoresis in agarose or nondenaturing 

35 polyacrylamide gels. The cDNA can also be introduced 
directly into the amplification step. 
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In referencing the gene, its cDNA sequences, other 
DNA sequences and RNA sequences, it is understood that 
any specifically referenced sequence includes any and all 
biologically functional equivalence thereof. Similarly, 
5 with listed protein sequences, it is understood that such 
terminology includes any and all biologically functional 
equivalence thereof insofar as the intended purpose is 
concerned. In the above identified biological assays it 
is understood that the full length or partial length 

10 sequences of the DNA or protein may be used. Generally 
it is contemplated that at least 18 sequential bases of 
the DNA sequence are useful as hybridization probes, PCR 
primers and the like. Similarly, with protein sequences, 
at least 15 sequential amino acid sequences may be 

15 correspondingly useful in developing protein receptors 
such as monoclonal antibodies. Such monoclonal 
antibodies may be made in accordance with the standard 
techniques by developing hybridomas for producing 
monoclonals specific to certain antigenic determinants of 

20 the protein structure. 

with reference to Table 4, it would appear that in 
view of the significant homology of exons 5, 6, 7, 8, 9, 
10 11 and 12 with the lAP domains, such homology may well 
mean that any deletions or other forms of mutations in 

25 these exons may result in the carrier being susceptible 
to the disease. For example, this is evidenced by the 
deletion of exons 5 and 6 in low copy numbers in hximans 
being causative of the disease. Hence, any of the 
sequence information in this region of the gene will be 

30 important from a diagnosis standpoint so that any 

sequential 18 bases of DNA or 15 sequential amino acid 
residues in this region may be relied on in the diagnosis 
of SHA in suspected humans. It is of course also 
understood that other foras of deletions, mutations, 

35 polymorphisms and the like in other regions of the gene 
may be causative of the disease or may be used for other 
purposes, in conjunction with disease analysis, prognosis 
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and perhaps treatment. 

Although the restriction naps are useful in 
identifying the characterizing features of the subject 
gene the specific cDNA sequence of exons 1 through 17 has 
5 been provided in sequence ID No. 1. The encoding portion 
of the sequence cononences at the ATG codon of base 396 of 
exon 5. The encoding portion ends at the stop codon TAA 
of exon 16 at base position 4092. Exons 1 through 4 are 
at the 5' untranslated region and exon 17 is at the 

10 3 'uns translated region. As with some genetic related 
diseases, mutations or polymorphism in the untranslated 
regions may as well be causative of the disease so that 
sequence portions in the form of probes and the like in 
regions other than the region of significant lAP homology 

15 may be valuable in the diagnosis of SMA. It is also 
understood that the sequence information of sequence ID 
No. 1 may be used in the construction of suitable cloning 
vectors for purposes of producing multiple copies of the 
gene or expression vectors for purposes of transfecting a 

20 host to produce significant quantities by recombinant 

techniques of the NAIP protein. Sections or fragments or 
full-length sequence information may be used in the 
construction of the cloning vectors or -expression vectors 
depending upon the end use of such vectors. With this 

25 understanding, the details in respect of the 
identification of the SMA disease gene its 
characteristics, the corresponding protein sequence and 
their uses in diagnosis are explained. 

A YAC contig of the Spinal Muscular Atrophy (SMA) 

30 disease gene region along chromosome 5ql3 was produced 
which incorporated the D5S435~D5S112 interval and 
encompassed 4 ^egabases. The CATT-40G1 subloci on the 
cosmid array showed significant linkage disequilibrium 
with Spinal Muscular Atrophy indicating close proximity 

35 to the gene. However, delineation of the precise region 
containing the SMA gene was not possible based on this 
information alone. A PAC contiguous array containing the 
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CATT region comprised of 9 clones and extending 
approximately 400 kb was constructed. The genetic 
analysis combined with the physical mapping data 
indicated that the 154 kb PAC clone 125D9 (Figure 7) 
5 which contained the CMS allele 9 and the 40G1 CATT 
sublocus had a good probability for containing the SMA 
locus. Through fxirther analysis as will be described, 
PAC 125D9 was found to contain the gene encoding neuronal 
apoptosis inhibitor protein. 

10 pYAC (yeast artificial chromosome plasmids) allow 

direct cloning into yeast of contiguous stretches of DNA 
^ 400 kb. Circular pYAC plasmids (without inserts) can 
replicate in coli. In vitro digestion of pYAC, 
ligation to exogenous DNA, and direct transformation of 

15 the subsequent linear molecules (with telomeric sequences 
at each termini) into yeast generates a library that can 
be screened by standard techniques. 

Large YAC constructs are as stable as natural 
chromosomes. They are good vectors for the construction 

20 of libraries from complex genomes such as the human 

genome. In addition, sequences which are unclonable in 
E. poli cosmid and lambda vectors are successfully cloned 
in YAC vectors. 

YAC vectors are normally propagated in bacteria as 

25 circular plasmids. Restriction enzyme target sites are 
arranged to produce two arms upon digestion, each of 
which contains a different selectable marker and 
terminates at one end in a telomere, the other in a blunt 
end. In addition, one of the arms contains an ARS 

30 element. The two arms are purified away from a linking 
fragment and ligated with donor DNA fragmented so as to 
leave blunt ends. The ligation mixture is used to 
transform yeast cells, and the selection conditions are 
such as to require the presence of both arms, the insert 

35 interrupts a third selectable marker which allows 
non-recombinant structures to be recognized. 
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ConstractioD of YAC Contig 

YAC clones were isolated from three libraries, 
constructed at the National Centers of Excellence (NCE, 
Toronto) , the Imperial Cancer Research Fund (ICRF, 
5 London) (Larin et al-, 1991) and the Centre d'Etude du 
Polymorphisme Humaine (CEPH, Paris) (Albertson et al., 
1990) , all of which were prepared from partial EcoRI 
digests of total DNA ligated into the YAC vector pYAC4. 
ICRF YAC clones were identified by probing library 

10 filters with 5gl3.1 probes. YAC DNA from the NCE library 
was screened by PCR amplification, electrophoresed, 
immobilized onto Southern blots and hybridized with the 
radiolabelled STS product to identify positives. 
Numerous positives were obtained repeatedly in both the 

15 initial round of PCR of pooled plates, and the second 
round with the plate (s) thought to contain the clone of 
interest many of which proved to be false positives. The 
number of false positives obtained, which appeared to be 
primer dependent, was reduced by radiolabelling PCR 

20 products and resolving these on 6% polyacylamide gels. 

The true positives could then be sized accurately without 
interference from spurious products. 

Yeast strains with YACs positive for 5ql3.1 STSs 
were grown on selective plates and examined for stability 

25 in the following manner: 4 colonies of each were grown 
for preparation in agarose blocks, yeast chromosomal DNA 
was separated by pulsed field gel electrophoresis and 
transferred to filters and the size and number of YAC 
clones contained within each yeast colony was determined 

30 by hybridization with radiolabelled total human genomic 
DNA. Positive clones were confirmed either by 
hybridization or PCR amplification with the original 
probe. Only YAC 24D6-2 contained some colonies with more 
than one YAC. 

35 YAC end clones and inter-Alu products were isolated 

by vector-Alu PCR and inter-Alu PCR respectively. The 
location of these products within 5qll-13 was confirmed 
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by hybridization to Southern filters of the somatic cell 
hybrids HHW105 (Dana et al., 1982), containing the entire 
chromosome 5, and HHW1064 (Gilliam et al., 1989) ^ a 
derivative containing chromosome 5 with a deletion at 
5 5qll.2-13,3. Many of these probes demonstrated 

hybridization profiles indicative of locations both 
within the 5qll-13 region and elsewhere on chromosome 5. 
In some cases primers specific for the ends of each VAC 
were generated from the sequences of YAC end clones 

10 isolated by vector-Alu-PCR. The mapping of each new STS 
to 5qll - 13 was determined by PCR amplification of DNA 
from the somatic cell hybrids HHW105 and HHW1064. In a 
few cases it was found that a primer pair contained a 
chromosome 5 repetitive sequence as the PCR amplified 

15 products from both HHW1064 and HHW105 were positive. 
Formulation of new STS primers resulted in the 
amplification of products specific to the 5qll-13 region. 
End clone hybridization and STS analysis performed on all 
YACs confirmed the orientation and location of each YAC. 

20 The assembly of a contiguous array of YACs covering 

the SMA interval was initiated from two markers which 
flank SMA; D5S125 (Mankoo et al., 1991), which lies 
centromeric to D5S435 and the more telomeric marker 
D5S112 (Lien et al., 1991) (see Figure 1). Six YACs were 

25 identified in the ICRF library by the telomeric marker 
pJK53 (D5S112). One of these YACs, D06100, was shown to 
extend the furthest centromerically based on end clone 
STS analysis. The centromeric end of this YAC identified 
two YACs from the NCE library, 1281 and 1284. YACs 

30 positive for the D5S125 or D5S435 markers were not found 
in the ICRF or NCE library thus the CEPH library was 
screened, from which clones containing D5S435 were 
isolated. A microsatellite polymorphism mapping into the 
center of the gap, CATT-l (Burghes et al., 1994), was 

35 utilized to detect three YACS, 24D6-2, 27H5 and 33H10. 
These YACs were shown to be linked to both the 
centromeric and the telomeric YACs (1281, 1284) by STS 
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analysis. Internal YAC products generated by AIuPCR were 
utilized to probe all YACs establishing the degree of 
overlap. STS sequences (Kleyn et al., 1993) mapping 
between JK348 and D5S112 were utilized to confirm the 
5 degree of overlap and the orientation of YACs in the 
contig. Concurrently the order of each STS along 5qi3 
was confirmed. In all a total of 14 YACs were 
identified, anchored by the genetic markers D5S435, 
D5S629, CMS-1, CATT-1, D5F153, D5F149, D5F150, D5F151, 
10 D5S557 and D5S112. 

Long Range Restriction Map and 

Estimation of Long range Physical Distance 

A restriction map of the critical SMA region was 

15 constructed from the STS Y116U (Kleyn et al., 1993), 

approximately 100 kb proximal to D5S629, to the STS Y107U . 
(Kleyn et al., 1993), which lies approximately 500 kb 
distal to D5S557 (see Figure 2). In order to. detect any 
possibility of deletions or rearrangements in our YACS, 

20 additional YACs isolated from the CEPH library (Kleyn et 
al., 1993), mapping within this region were included in 
the analysis. YACs 24D62, 27H5, 33H10, 155H11, 76C1, 
235B7, 184H2, 428C5, and 81B11 (Kleyn et al., 1993) were 
partially digested utilizing the rare cutter restriction 

25 endonucleases NotI, BssHII, Sfil, and RsrI. Southern 
blots of the Pulse Field Gel Electrophoresis (PFGE) 
separated restriction products were hybridized with YAC 
left arm and right arm specific probes which revealed the 
positions of cleavage sites from both ends of each YAC. 

30 The orientation and overlap of the YACs had been 
— . previously determined based on STS analysis, therefore 
the position of the rare cutter sites among the 
overlapping YACs were compared. By aligning the* 
overlapping YACs at their common rare cutter sites, the 

35 degree of overlap could be more precisely determined. 
The long range restriction map of the overlapping YACs 
derived from different sources was mostly in agreement 
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with the exception of 33H10 and 428C5. 428C5 has 
previously been documented to contain a deletion (Kleyn 
et al*; 1993), evident by comparison of its STS content 
and its size of only 300 kb, indicating that it lies 
5 further centromeric than its placement in Figure 2. YAC 
33H10, based on STS analysis contains an internal 
deletion and YAC 155H11 is chimeric at its telomeric end 
therefore rare cutter sites at the telomeric end of the 
map which could not be confirmed were not included. The 

10 results indicate the distance from the centromeric 

boundary D5S435 to the telomeric boundary D5S557 to be 
1.4Mb in marked contrast to 400 kb as previously reported 
(Francis et al., 1993) but in agreement with one other 
estimate (Wirth et al., 1993). Furthermore, the D5S629- 

15 D5S557 interval can be estimated at 1.1 Mb and the 
distance of the genetically defined CMS1-SMA-D5S557 
interval is approximately 550kb. 

Cosmid Contig Assembly from the Chromosome 5 Library 

20 Although the isolation of cosmids utilizing whole 

YACs as probes could be an expeditious method of 
constructing a cosmid contig, in this case the presence 
of chromosome 5 specific repeats would likely result in 
the isolation of cosmids mapping elsewhere on chromosome 

25 5. A directed cosmid walking strategy was thus adopted. 
The CATT-1 STR, Which has been shown by irradigition 
hybrid analysis to map approximately midway between the 
two flanking markers D5S435 and D5S351 (Hudson et al«, 
1992) , was utilized as the initiation point for the 

30 construction of a cosmid clone array. The complex 

pattern of amplification seen on genomic DNA, with two to 
eight alleles per individual (see Figure 3), suggested a 
variable number of copies or loci of the CATT-l sequence 
in this region. Thirty CATT-1 positive cosmids were 

35 identified which upon PGR analysis were seen to contain 
one of four distinct alleles (see Figure 3). As the 
cosmid library was derived from a monochromosomal source, 
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this confirmed that the CATT 5TR exists at least in four 
locations, which ve refer to as subloci. These subloci 
are referred to as CATT-40G1, CATT-192F7, CATT-58G12 and 
CATT250B6-based on the cosmid addresses of the first 
5 cosiaids identified containing alleles of 12, 19, 15 and 
20 cytosine adenosine (CA) dinucleotides respectively* 
Bi-directional walking was initiated from these 4 cosmid 
subloci. Positive hybridization was observed for cosmid 
250B6 with one end of 58G12 and for 192F7 with the other 

10 end resulting in the ordering of cen-192F7-58G12-250B6- 
tel (Figure 4), All cosmids which contained the CATT- 
192F7 allele were mapped to this location based on the 
size of their CATT-1 allele and their restriction enzyme 
profiles. As shown in Figure 4 the CATT-192F7 sublocus 

15 is telomeric to the STR CHS*>1, which itself lies 
telomeric to the CATT-40G1 sublocus. 

Due to the presence of chromosome 5 specific 
repetitive sequences, resulting^ in the identification of 
cosmids from another region of chromosome 5, the 

20 integrity of the con tig was verified with each step 
taken* Cosmid end clones generated by vector-AIu-PCR 
were hybridized to somatic cell hybrid panels as 
described above. As repetitive sequences which map 
solely to the region of chromosome 5 that is deleted in 

25 the hybrid cell line HHW1064 have been observed, cosmids 
identified by end products which did not hybridize to 
HHW1064 were analyzed further. Proof of overlap was 
shown by hybridization of end clones, single copy probe 
hybridization, STS content, and restriction enzyme 

30 profile comparison. Cosmids identified by end clones 
which hybridized to HHW1064 were eliminated and walking 
was continued by utilizing a different inter-Alu product 
from the clone of origin, which was verified in the same 
manner. Cosmid sizes were calculated by the addition of 

35 EcoRI restriction fragments and the extent of overlap was 
determined by the addition of those fragments in common. 
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Cosmid Contig Assembly of YAC 76C1 Cosmids 

As extension of the cosmid contiguous array was 
prevented by the presence of chromosome 5 specific 
repeats, a 5X cosmid library was produced from YAC 76C1. 
5 The STSs CATT- 1, CMS-1, Y122T (Kleyn et al., 1993), Y97T 
(Kleyn et al., 1993) and Y98T (Kleyn et al., 1993), which 
are distributed along the YAC were utilized to identify 
cosmids to assemble the contig. As well, the previously 
developed markers, pZY8, pL7, pGA*l, plS.l, p402«l, 

10 p2281.8 and 3-glucuronidase (Oshima et al., 1987) (Table 
2, Figure 4) from the established cosmid contig were 
hybridized to the library providing an effective method 
of ordering the cosmids* Cosmids demonstrating irregular 
hybridization patterns and thought to contain deletions 

15 and/ or rearrangements were excluded. 

The STS Y98T identified three cosmids including one 
previously identified by the probe p2281.8, derived from 
a chromosome 5 library cloney— 228C8, also containing the 
STS Y98T. An end product of this cosmid hybridized to 

20 ten cosmids. Concurrently, an end fragment of a CATT40G1 
sublocus was shown to hybridize to four of these ten 
cosmids thus linking CATT-40G1 and CMS-1 with the more 
centromeric STS Y98T (Figure 4). We were unable to 
identify any clones containing the YAC end STS Y97T. 

25 Filter hybridization and STS mapping experiments 
indicated a second more telomeric location of the 
CATT40G1 sublocus. A duplication of this sublocus would 
agree with genotype data in our SNA kindreds (McLean et 
al. , in press) . 

30 An EcoRI restriction map was generated utilizing a 

minimal set of cosmids necessary to span the region. To 
ensure the reliability of the contig, we sought to 
integrate it with the contig constructed from the 
chromosome 5 specific library. Concordance of the 

35 contigs was evident by comparison of the restriction 

maps, the position of probes and STSs on the map and AZu- 
PCR fingerprinting. In this manner the size of the 
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contig was estimated to be 210 kb. A directed walking 
strategy has thus resulted in the generation of a single 
contiguous set of cosmids containing the CATT-1 cluster 
of subloci with known centromere/telomere orientation. 

5 

Duplications /Deletions 

Several lines of evidence suggested the presence of 
genomic sequence duplications within our cosmid array. 
We provide evidence for the duplication of the CATT-46G1 

10 sublocus in cosmids derived from a single chromosome 5. 
A centromeric location for this sublocus established as 
the CATT-4061 sublocus was found to be contiguous with 
the STSs Y122T, Y88T and CMS-1 in several cosmids, and 
the centromeric YAC 428C5 is positive for probes isolated 

15 from the CATT-40G1 containing cosmids. Although YAC 
428C5 does not contain the CATT40G1 sublocus upon PGR 
amplification, this may be explained either by a null 
allele in the chromosome from which the YAC was derived 
or a deletion in the YAC. We have previously observed 

20 null alleles in individuals at distinct CATT*1 subloci. 
A second more telomeric location of CATT-40G1 was 
determined by the hybridization to CATT40G1 cosmids of 
the probes pGA- 1, pL7, and pZY8 all of which bind the 
more telomeric YACs 33H10, 24D62. The hybridization of 

25 p402.1, derived from cosmid 40G1., to cosmids at both 
locations would indicate that the duplication is not 
restricted to the CATT-40G1 subloci and likely 
encompasses a larger region. Southern blot analysis 
revealed distinct profiles of cosmids for the two 

30 locations however common bands were detected by AIu-PCR 
fingerprinting supporting a duplication. 

Correlation of our YAC contig with the cosmid contig 
revealed that YACs 76C1, 81B11, and 27H5 span the 150 kb 
CATT region of 5ql3. Despite this, CATT-1 genotyping of 

35 these YACs revealed only one allele size^ raising the 
possibility that the chromosomes from which these YACs 
were derived (4 in all) contain null alleles at their 
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remaining CATT-1 subloci. Our experience, however, with 
CATT linkage analysis of SMA families indicated that such 
a scenario is highly unlikely as none of the 
approximately 300 individuals genotyped had fewer than 2 
5 alleles. We consequently believe it is more likely that 
these CATT subloci are unstable and have been deleted 
during YAC construction and/or propagation. 

Sequence comparison between the CATT-^l and D5F153 
primer sequences Indicated that these two STRs were 

10 similar and possibly the same as one primer is identical 
and the other primer sec[uence8 overlap by eight 
nucleotides. However, the centromeric YACs 428C5, 
232F12, 235B7, 184H2, and the telomeric YACs 12H1, 
155H11, 269A6 which were CATT-1 negative yielded D5F153 

15 amplification products indicating that CATT-1 may be a 
derivative of D5F153. These data, in combination with 
D5F153 analyses of the cosmid contig, which contains 
three D5F153 loci (Figure 4) , indicated that at least 
five D5F153 subloci exist. 

20 In addition to the CATT-1 and D5F153 STRs, the STRs 

CMS-l and D5F150 were present in a variable number of 
copies per chromosome 5, STS analysis localized CMS-r to 
YACs 428C5, 76C1, 81B11 and 27H5 with allele sizes of 5, 
4, 4 and 3, and 4 respectively. PGR amplification of 

25 genomic DNA revealed up to four alleles per individual 
indicating as many as two copies per chromosome. D5F150 
was present at two locations within the cosmid array yet 
only one location was detected in the YAC contig. D5F151 
was not detected within our cosmid array nevertheless it 

30 was placed at the centromeric end of YAC 33H10, which 
encompasses the cosmid array, based on the positive 
amplification .o£^YAC 428C5. One location of D5F149 was 
detected on both our cosmid and YAC clones. Our data 
suggested, as with CATT-1, the existence of null alleles 

35 and/or instability of the CMS-l, D5F150, D5F151, D5F149 
sequences in YACS. 
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A deletiori event was observed in hybridization with 
an 800 bp EcoRI fragment isolated as a single copy probe 
from the CATT-40G1 containing cosmid 234A1 from the 
chromosome 5 specific cosmid library. Probings of YAC 
5 DNA failed to detect this fragment in any of our YACs. 
Hybridization to genomic DNA of several individuals did 
not identify any deletion events thus this sequence may 
be susceptible to instability in the YACS. Sequencing of 
this fragment did not reveal any exons or coding region. 

10 Further evidence of sequence duplication in the SMA 

region was identified with a 1.2 kb internal AIu-PCR 
product (P151.2) from cosmid 15F8 (Figure 4). The probe 
identified three EcoRI fragments in YAC clones 76C1, 
81B11 and 27H5 (20 kb, 12 kb and 3 kb) but only one in 

15 33H10 and 24D6 (20 kb) and one in 428C5 (12 kb) . An 

internal EcoRI site divided this marker into 500 bp and 
700 bp probes. The larger probe identified the 12 kb and 
20 kb fragments while the smaller probe identified the 3 
kb and 20 kb fragments (Figure 5) . We ruled out 

20 instability of this sequence in YACs as they are from 
different libraries and the hybridization patterns 
reflected their physical location. The 12 kb and 3 kb 
fragments were localiz^ed on the EcoRI restriction map, 
however we were unable to position the 20 kb fragment. 

25 Taken together these findings suggest the 12 kb and 3 kb 
lie in tandem with a centromeric/telomeric orientation 
respectively. A location of the 20 kb fragment distal to 
our contiguous array of cosmids may be inferred from the 
data. The duplication was confirmed by hybridization to 

30 genomic DNA digests revealing all three fragment sizes. 

YAC Contig and Cosmid Contig Characteristics 

We established a YAC contig of the SMA disease gene 
region, incorporating the D5S435-D5S112 interval and 
35 encompassing 4 Mb. Orientation of the contig along 5ql3 
was confirmed by analysis of seven genetic markers and 
STSs in combination with PFGE analysis. The long range 
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restriction map revealed neither major deletions nor 
rearrangements among the YACs within our contig, and was 
utilized to refine the estimates of the size of the 
contig. Our YAC map establishes physical linkage of the 
5 markers D5S629, D5F153, D5F151, D5F150, D5F149, CMS-1, 
CATT-1 and D5S557 to a 1.1 Mb region, a region of the 
genome characterized by low copy repetitive sequences and 
multilocus STRS. Furthermore, ve estimated the new 
genetically defined CMS1-SMA-D5S557 to be 550 kb. 

10 Estimates of the physical distance of the D5S435-D5S557 
interval ranging from 400 kb (Francis et al., 1993) to 
1.4 Mb (Wirth et al., 1993) have been reported. In 
contrast to these studies our estimation of 1.4 Mb for 
the D5S435-SMA-D5S557 interval and 550kb for the CMSll- 

15 SMA*D5S557 interval, employs clones derived from three 
sources, comprised of 6 chromosomes. Moreover, the 
determination of both the size of clones and the position 
of rare cutter sites has enabled us to determine more 
precisely the extent of overlap of the YACs and the size 

20 of the contig providing a reliable estimation. 

We also assembled a single contiguous array of 
cosmid clones derived from both a chromosome 5 specific 
library and a YAC (76C1) specific library in conjunction 
with a restriction map of the CMS-l/CATT- 

25 1/D5F153/D5F150/D5F149 region encompassing 210 kb. The 
repetitive sequences prevented extension of the cosmid 
contig when utilizing a chromosome 5 specific library 
necessitating construction of a cosmid library YAC 76C1 
in the critical region. The contiguous cosmid array was 

30 constructed by a directed walking strategy with 

validation of cosmid overlap established by restriction 
fragment enzyme overlap, Alu fingerprinting, and analyses 
involving STSs, cosmid end clones and single copy probes. 
Physical and genetic mapping analyses revealed a 

35 complex region of genomic DMA comprising duplications and 
the presence of repetitive sequences. Genotyping of 
genomic DNA with complex STRs from this region revealed 
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the presence of a polymorphic number of bands ranging as 
high as eight per individual. This suggested the 
presence of multiple copies, or subloci, for the STRs 
CATT-lv CMS-1, D5F153, D5F150. Our physical mapping data 
5 confirmed the presence of these subloci except in the 
case of D5F151 and D5F149 which revealed only one 
location* Four of the CATT-1 subloci map to our cosmid 
array within a 140 kb region; at least one of these 
subloci, CATT-40G1, is duplicated. D5F153 and CATT«-1 are 

10 related STRs which appear to have diverged from a common 
ancestor. We had localized one CHS-1 sublocus to our 
cosmid array I however, we were unable to determine from 
our data whether other subloci exist on other chromosomes 
within this 200 kb interval, as the chromosomes from 

15 which the YAC/ cosmid libraries were derived may either 
contain null alleles at the remaining subloci or have 
sustained deletions. 

The CATT-1, D5F153, D5F150 and D5F149 STR, although 
present in multiple copies on chromosomes in the 

20 population were observed as single sublocus markers on 
all YACS, as evidenced by single allele PCR products for 
each, suggesting instability and deletion of these 
sequences. This is supported by the absence in our YACs 
of an 800 bp fragment, derived from the chromosome 5 

25 cosmid library based contiguous array. Instability of 
these sequences does not appear to result in large 
deletions as additional unique sequence probes located 
between the multiple subloci are retained in the YACs. 
In summary, we have . produced the first high 

30 resolution physical map of the critical SMA region. 
However, delineation of the precise region which 
contained the SMA gene was not possible based on this 
information alone. 

Concurrent with our genetic analysis, we constructed 

35 a YAC contiguous array employing clones from three 

different YAC libraries (Roy et al., 1994). A minimal 
representation from this array, which was correlated with 
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extensive pulsed field gel electrophoresis (PFGE) 
analysis, is shown in Figure 9B. 

With the initial suggestion of linkage 
disequilibrium of the general CATT marker and SMA 
5 (Burghes et al., 1994), the construction of a cosmid 
contiguous array incorporating the extended CATT region 
was undertaken. The presence of extensive and 
polymorphic genomic repetitive elements mapping both to 
5ql3 and elsewhere on chromosome 5 interfered with a 

10 straightforward assembly of a contiguous array. However, 
the integrity of the array was established by restriction 
enzyme analyses, Alu-PCR fingerprinting, STS content 
determination and nucleic acid hybridization using cosmid 
end clones and other single copy probes. This resulted 

15 in the generation of an array encompassing 220 kb that 
contained the five CATT subloci contained in a mono- 
chromosomal ly derived flow sorted chromosome 5 genomic 
library (Roy et al., 1994). More recently., a PI 
artificial chromosome (PAC, loannou et al., 1994) 

20 contiguous array containing the CATT region, comprised of 
10 clones and extending approximately 550 kb, was 
constructed (Figure 9C) • 

Linkage Disequilibrium Analysis 
25 A linkage disequilibrium analysis employing 5 

complex and simple tandem repeats mapping to the SHA 
region was conducted. Two of the polymorphisms employed 
in this analysis were the CATT*40G1 and CATT-192F7 
subloci which we mapped to our cosmid array. Specific 
30 amplification of the two individual subloci was achieved 
— - by constructing primers ending on sequence polymorphisms 
in the region flanking the CA repeat. A clear linkage 
disequilibrium peak was observed at the CATT-40G1 
sublocus as shown in Figure 6. 
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PAC Contig Array 

Since the 40G1 CATT subloci demonstrated linkage 
disequilibritim, a PAC contiguous array containing the 
CATT region was constructed. This PAC contig array 
5 comprised 9 clones and extended approximately 400 kb 
(Figure 7). Our genetic analysis combined with the 
physical mapping data indicated that the 40G1 CATT 
subloci marker which showed the greatest disequilibrixm 
with SMA was duplicated and was localized at the extreme 

10 centromeric of the critical SMA interval. Consequently 
the 154 kb PAC clone 125D9 which contained within 10 kb 
of its centromeric end the SMA interval defining CMS 
allele 9 and extended telomerically to incorporate the 
40G1 CATT sublocus was chosen for further examination « 

15 Two genomic libraries were constructed by performing 

complete and partial (average insert size 5 kb) Sau3Al on . 
PAC 125D9 and cloning the restricted products into BanHl 
digested Bluescript plasmids. Genomic sequencing was 
conducted on both termini of 200 clones from the 5 kb 

20 insert partial Sau3Al library in the manner of (Chen et 
al., 1993) permitting the construction of contiguous and 
overlapping genomic clones covering most of the PAC. 
This proved instrumental in the elucidation of the 
neuronal apoptosis inhibitor protein gene structure. 

25 PAC 125D9 is cleaved into 30 kb centromeric and 125 

kb telomeric fragments by a NotI site (which was later 
shown to bisect exon 7 of the PAC 125D9 at the beginning 
of the apoptosis inhibitor domain. The NotI PAC 
fragments were isolated by preparative PFGE and used 

30 separately to probe fetal brain cDNA libraries. Physical 
mapping and sequencing of the NotI site region was also 
undertaken to assay for the presence of a CpG island, an 
approach which rapidly detected coding sequences. The 
PAC 125D9 was also used as a template in an exon trapping 

35 system resulting in the identification of the exons 
contained in the neuronal apoptosis inhibitor protein 
gene. 
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The multipronged approach, in addition to the 
presence of transcripts identified previously by 
hybridization by clones from the cosmid array (such as, 
GAl and L7} , resulted in the rapid identification of six 
5 cDNA clones contained in neuronal apoptosis inhibitor 
protein gene. The clones were arranged, where possible, 
into overlapping arrays. Chimerism was excluded on a 
number of occasions by detection of co-linearity of the 
cDNA clone termini with sequences from clones derived 
10 from the PAC 125D9 partial Sau3Al genomic library. 

Cloning of Neuronal Apoptosis Inhibitor Protein Gene 
In the meantime, a human fetal spinal cord cDNA 
library was probed with the entire genomic DNA insert of 

15 cosmid 250B6 containing one of the 5 CATT subloci. This 
resulted in a detection of a 2.2 kb transcript referred 
to as GAl which location is shown in Figure 7. Further 
probings of fetal brain libraries with the contiguous 
cosmid inserts (cosmids 40G1) as well as single copy 

20 subclones isolated from such cosmids were undertaken. A 
number of transcripts were obtained including one termed 
L7. No coding region was detected for L7 probably due to 
the fact that a substantial portion of the clone 
contained unprocessed heteronuclear RNA. However, we 

25 later discovered that L7 proved to comprise part of what 
is believed to be the neuronal apoptosis Inhibitor 
protein gene. Similarly, the GAl transcript ultimately 
proved to be exon 13 of the neuronal apoptosis inhibitor 
protein. Since GAl was found to contain exons indicating 

30 that it was an expressed gene, it was of particular 

interest. The GAl transcript which was contained within 
the PAC clone 125D9 was subsequently extended by further 
probing in cDNA libraries. 

The extended GAl transcript was compared to other 

35 known sequences to reveal that its amino acid sequence 
had significant homology to the inhibitor apoptosis 
polypeptides of Orgyia Pseudotsugata and Cydia Pomonella 
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viruses (Table 3). This sequence analysis revealed the 
presence of inhibitor apoptosis protein homology in exons 
5 and 6. 

The remaining gaps in the cDNA were completed and 
5 the final 3' extension was achieved by probing a fetal 
brain library with two trapped exons, A physical map of 
the cDNA with overlapping clones was prepared. The 
entire cONA sequence is shown in Table 4 and contains 
sixteen exons. The amino acid sequence starts with 

10 methionine which corresponds to the nucleotide triplet 
AT6. Figure 8 demonstrates the structural organization 
of the SMA gene. 

The cDNA sequence of NAIP shown in Table 4 allows 
one skilled in the art to develop from this gene, 

15 primers, probes and also antibodies against the protein 
product. The cDNA sequence of Table 4 may be used in 
recombinant DNA technology to express the sequence in an 
appropriate host in order to -produce the neuronal 
apoptosis inhibitor protein. In this manner, a source of 

20 neuronal apoptosis inhibitor protein is provided. Given 
the sequence of NAIP and the probes and primers therein, 
deletions in the sequence may also be detected, for 
instance, in the disorder Spinal Muscular Atrophy. 

25 NAIP Structure 

The NAIP gene contains 17 exons comprising at least 
5.5 Kb and spans an estimated 80 kb of genomic DNA. The 
NAIP coding region spans 3698 nucleotides resulting in a 
predicted gene product of 1233 amino acids. NAIP 
30 contains two potential transmembrane regions and an 

intracellular inhibitor of apoptosis domain immediately 
contiguous with a GTP binding site. Searches of the 
protein domain programs generated the following results: 
(i) residues 9-91: an N terminal domain with no 
35 recognizable motifs. 

(ii) residues 94-118: hydrophobic potential 
membrane spanning domain. 
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(iii) residues 169-485: a doxaain which shows 
homology with apoptosis inhibitors and is immediately 
before the next hydrophobic domain, GTP/ATP binding site, 
(iV) residues 486-504: a hydrophobic potential 
5 membrane spanning domain. 

(V) residues 505-1005: possible receptor domain 

containing 4 N*linked glycosylation sites and a 
lipoprotein binding domain 

10 NeuFOiial Apoptosis Inbibitor 

Protein Gsne Mutational Analysis 

A CDNA20.3 probe was found by using the entire PAC 
125D9 as a probe to screen cDNA libraries. Probing of 
genomic southerns with cDNA probe 20.3 revealed the 

15 absence of a 9 kb EcoRI band in a Type III consanguineous 
family. This information mapped the NAIF gene deletions 
to exons 5 and 6. Thus the deletion covers the exon 
containing the rare NotI restriction site and the exon 
immediately downstream. Primers in and around these 

20 exons were constructed revealing the absence of 
amplification from 3 Type I and 3 Type III SMA 
individuals. Genomic DNA was isolated from PAC and 
cosmid subclones in and around exons 4 and 5 and 
sequenced in an effort to generate primers which would 

25 amplify the junction fragment generated by the causative 
deletions as depicted. A junction fragment was detected 
in the Type III individual. A similar product was 
observed in two other French Canadians with no history of 
consanguinity. The 3 Type I and 3 Type III SMA 

30 individual's chromosomes had identical CATT/CMS 

haplotypes strongly suggesting that this is a common mild 
SMA mutation and comparatively frequent in the French 
Canadian population. Cosegregation of this pattern was 
demonstrated. We have conducted analysis of 110 parents 

35 of SMA individuals and have failed to find a similar 
product. Sequencing of the genomic DNA in this region 
revealed an approximately 10 kb deletion resulting in an 
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in frame deletion. This deletion spans intron regions 
and exons 5 and 6. Southern blot analysis of two 
generation SMA families was performed. A cDNA probe 
encompassing the first eight exons was performed on 
5 EcoRI -digested DNA from peripheral blood leukocytes. SMA 
affected members show an absence of hybridization to a 10 
kb EcoRI band which was shown to contain exons 5 and 6 
(Figure 9) . 

Initial isolation of the NAIP transcript was 

10 achieved by probing a human fetal brain cDNA library with 
the entire 28 kb genomic DNA insert of cosmid 250B6 that 
contains one of five CATT subloci present in the cosmid 
library* This resulted in the detection of a 2.2 kb 
transcript that ultimately proved to be exon 14 of the 

15 NAIP gene. Further probing of fetal brain libraries with 
the contiguous cosmid inserts (cosmid 40G1) , as well as 
single copy subclones isolated from such cosmids 
identified a number of transcripts including the L7 
transcript that ultimately proved to contain exon 13 of 

20 the NAIP locus. No coding region was detected for L7, 

probably due to the fact that a substantial proportion of 
the clone contained unprocessed heteronuclear RNA, 
obscuring its true nature. 

At this stage, the completed genetic and linkage 

25 disequilibrium analyses and construction of the PAC 

contiguous array Identified PAC 125D9 as having a good 
probability of containing the SHA locus. Four PAC 125D9 
genomic libraries were constructed by performing 
complete and partial (average insert size 5 kb) Sau3AI, 

30 BamHI and BamHI/NotI digests on the PAC insert and 
cloning the restricted products into plasmid vector. 
High through p.ut.-genomic sequencing was conducted on both 
termini of 200 clones from the 5 kb insert partial Sau3AI 
digestion library in the manner of (Chen et al., 1993), 

35 permitting the construction of contiguous and overlapping 
genomic clones covering most of PAC 125D9 (data not 
shown) . This has proven instrumental in elucidating the 
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gene structure of the NAIP locus • 

PAC 125D9 is divided into 24 kb centromeric and 130 
kb telomeric fragments by NotI digestion, bisecting exon 
6 of the NAIP gene at the beginning of the first 
5 potential transmembrane domain mapping upstream of the 
inhibitor of apoptosis homologous domains (Figure 11 and 
Table 4) . The NotI PAC fragments were isolated by 
preparative PFGE and used separately to probe human fetal 
brain cDNA libraries. Physical mapping and sequencing of 

10 the NotI site region was also undertaken to assay for the 
presence of a CpG Island, an approach that rapidly 
detected coding sequence. The PAC was also used as a 
template in an exon trapping system (Church et al., 1994) 
resulting in the identification of the NAIP gene exons 5^ 

15 12, 16 and 17. 

This multi-pronged approach resulted in the 
identification of cDNA clones spanning the NAIP gene 
(Figure 10) . Overlapping clones were identified and 
chimerism of cDNA clones was excluded on a number of 

20 occasions by the detection of co-linearity of the cDNA 
clone termini with sequence from clones of the PAC 125D9 
partial Sau3AI digestion genomic library. At this time, 
sequence analysis revealed the similarity between the 
protein sequence encoded by the NAIP gene exons 7 through 

25 13 with two baculoviral inhibitor of apoptosis proteins 
(lAPs) . Shortly thereafter, probing of Southern blots 
containing DNA from consanguineous SMA families with cDNA 
probes revealed deleted bands. 

Both lAPs contain in their amino terminus an 80 

30 amino acid BIR (baculovirus lAP repeat) motif that, after 
an intervening sequence of approximately 30 residues, is 
duplicated with'~33% identity (Clem and Miller, 1993) . 
The same phenomenon is observed in NAIP; amino acids 
185-250 encoded by exons 6, 7 and 8 are 35% homologous to 

35 amino acids 300-370 encoded in exon 10, 11 and 12. The 
greatest stretch of homology is observed over a 53 amino 
acid region with 29 identical amino acids. 
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In addition to the NH2 terminal lAP domain, there 
exists cysteine and histidine rich zinc finger-like 
motifs in the carboxy terminus of both CpIAP and OpIAP. 
These motifs, which are proposed to interact with DNA 
5 (Birnbaum et al., 1994), are not seen in NAIP (Table 4). 
NAIP contains two potential transmembrane regions that 
bracket an inhibitor of apoptosis domain and a contiguous 
6TP binding site. Additional searches of protein domain 
programs generated the following more specific results 
10 than the aforementioned protein domain evaluation. 

1. Residues 1-91: an N terminal domain with no 
recognizable motifs; 

15. 2. Residues 92-110: a hydrophobic domain predicted 

by the MEMSAT program (Jones et al., 1994) to 
be a membrane spanning domain; 

3. Residues 163-477: a domain that shows homology 
20 with baculoviral inhibitors of apoptosis 

proteins followed by, and immediately upstream 
of the next hydrophobic domain, a GTP/ATP 
binding site; 

25 4. Residues 479-496: hydrophobic domain predicted 

by MEMSAT to be a membrane spanning domain; 

5. Residues 497-1232: a possible receptor domain 
containing four N-linked glycosylation sites 
30 and a procaryotic lipid attachment site. 

We know of at least three exons that comprise 400 bp 
of 5' untranslated region (5'UTR) ; it is possible that 
more exist. A striking feature of this region is the 
35 presence of a perfect duplication of a 90 bp region in 
the 5' UTR before exon 2 and in the region bridging exons 
2 and 3 (Table 4). In addition, the 3' untranslated 
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region comprising exon 17 has been found to contain a 550 
bp interval that has potential coding region detected by 
the GRAIL program with high homology (P=l.le-37) to the 
chicken integral membrane protein, occludin (Furuse et 
5 al., 1993). There exists, the possibility that this 
represents a chimeric transcript. Occludin homologous 
sequence has been detected in four different cDNA clones 
and two isoforms of the gene* The possibility of the 
occludin sequence representing a coding exon of the NAIF 

10 gene with the putative 3' UTR actually being 

heteronuclear RNA is also unlikely given the consistency 
with which the 3' UTR is observed and the presence of in 
frame translational stop codons mapping upstream of the 
region of occludin homology. Preliminary RT-PCR analysis 

15 indicates that the occludin tract is transcribed. 

Tissue Expression 

Hybridization of a Northern blot containing adult 
tissue mRNA with an exon 14 probe detected bands only in 

20 adult liver (approximately 6 and 7 kb bands) and placenta 
(7 kb. Figure 6) . Although the level of expression in 
adult CNS is not sufficient to result in visible bands on 
Northern analysis, successful reverse transcriptase-PCR 
(RT-PCR) amplification of the NAIP transcript using 

25 spinal cord, fibroblast and lymphoblast RNA suggests 
transcriptional activity in these tissues. 

Detection of Truncated and Internally 
Deleted Versions of the NAIP gene 

30 In the analysis of the PAC contig, the clones 238D12 

and 30B2 were noted to show significant sequence 
similarity with 125D9 but not to contain the NotI site in 
PAC 125D9 that is located in NAIP exon 6. This indicated 
the possibility of duplicated copies of. the NAIP gene and 

35 so further analysis by hybridization of Southern blots 
containing PAC DNA with NAIP exon probes and PCR STS 
content assessment was undertaken. In this manner, two 
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aberrant versions of the NAIP locus were detected, one 
with exons 2 to 7 deleted (PAC 238D12), and another with 
exons 6, 7 and 12 to 15 deleted (PACs 30B2 and 25017). 
The presence of identical sized bands in both genomic and 
5 PAC DNA on Southern blot analysis as well as PGR results 
outlined below obviate the possibility that the deletions 
represent in vitro PAC artifacts rather than the in vivo 
situation. Thus, genomic DNA Southern blots hybridized 
with NAIP exon probes revealed more bands than would be 

10 expected with a single intact copy of the NAIP gene. For 
example, probing of blots containing BamHI restricted 
genomic DNA with NAIP exons 3*11 should lead to a single 
band comprised of equal sized contiguous 14.5 kb BamHI 
fragments in the intact NAIP locus (Figure 11) . Instead, 

15 two additional bands are seen at 9.4 and 23 kb (Figure 
14), fragments that are seen in PACs 238D12 and 
30B2/250I7 respectively. The 9.4 fragment BamHI has been 
subcloned from a cosmid and found to contain exons 8-11 
with a deletion incorporating exons 2 to 7 occurring just 

20 upstream of the 8th exon (Figure 11). The 23 kb band is 
generated by a 6 kb deletion removing a BamHI site 
leading to the replacement of the two contiguous 14,5 kb 
BamHI fragments with a 23 BamHI fragment containing exons 
2 to 5 and 8 to 11 and lacking exons 5 and 6 as depicted 

25 in Figure 11. The left side of this deletion was mapped 
by the fact that amplification with primers 1933 and 1926 
generated a product whereas PCR with 1933 and 1923 did 
not (data not shown) . PCR employing primers 1927 and 
1933, constructed to amplify a 4.2 kb junction fragment 

30 spanning the 6 kb deletion (Figure 11) , generated the 

— - appropriate product as shown by size and sequencing in 
both genomic DNA and PACs 30B2/250I7. The variable 
dosage of both the 9.4 and 23 kb bands seen in genomic 
DNA from different individuals Indicates that the two 

35 partially deleted versions of the NAIP gene are present 
in multiple and polymorphic number in the general 
population. 
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A further level of complexity was detected with the 
identification of clones from a non-SMA human fetal brain 
cDNA library deleted for exons 11 and 12 (Scheme #1) , 
some of which also had exons 15 and 16 (Scheme #1) absent 
5 (Figure 10) . The fact that these deletions result in 
frame shifts and premature protein truncation indicates 
that they are, rather than normal splicing variants, more 
likely the result of transcription of the deleted and 
truncated version of NAIP gene that are present in the 
10 general population (Figure 11) . In all, a profile of a 
region containing a variable number of copies of 
internally deleted and truncated versions of the NAIP 
locus, some of which are transcribed, has emerged from 
our analysis. 

15 Probings of blots containing DNA from the somatic 

cell hybrid HHW 1064 (Gilliam et al., 1989) with NAIP 
exonic probes indicates that all forms of the NAIP gene 
are confined to the 30 Mb deleted region o£ 5qll*13.3 
contained in the derivative chromosome 5 of this cell 

20 line. This finding has been confirmed by FISH probings 
with NAIP exon 13 probe (unpublished data) . 

NAIP Gene Mutational Analysis 

Probing of genomic Southern blots with PGR amplified 

25 NAIP exons 3 to 10. revealed the absence of a 4.8 kb 

EcoRl/BamHI fragment containing exons 5 and 6 in the four 
affected individuals of consanguineous Type III SMA 
family 24561 (Figure 11 and 14) . The same probing of 
BamHI digested DNA from this family revealed the absence 

30 of a 14.5 kb band also in keeping with a loss of exons 5 

— . . and 6 as outlined above (Figure 11 and 14) . Similar 
results were observed in two other French Canadian SMA 
families that were also believed consanguineous; 

In order to confirm the proposed deletion of exons 5 

35 and 6, primers homologous to these exons were made 

(primers 1893, 1864, 1863, 1910 and 1887 identified by 
arrow in Figure 11. Results of a representative PGR 
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amplification of DNA from the family 24561 and a second 
Type III SHA consanguineous family using exon 5 specific 
primers (primer 1864 and 1863) along with a simultaneous 
reaction of an exon 13 sequence included to rule out a 
5 failure of the PCR are shown in Figure 15. Absence of 
amplification of exon 5 can be seen to cosegregate with 
the SMA phenotype. 

In order to determine if the exon 5 and 6 NAIF gene 
deletion was an SMA mutation, Southern blot analysis was 

10 conducted. An 800 bp EcoRV single copy probe that mapped 
immediately to the 3' side of -the 6 kb exon 5 and 6 
deletion was employed (Figure 11) • Hybridization of this 
marker to EcoRI Southern blots detected both a 9.4 kb 
EcoRI fragment containing exons 5 and 6 from the intact 

15 NAIF locus as well as a 3 kb EcoRI band from the exon 5 
and 6 deleted copy of the NAIF gene. Analysis was 
conducted on EcoRI Southern blots containing DNA from 
over 900 unrelated members of myotonic dystrophy, ADPKD 
and cystic fibrosis families obtained from our DNA 

20 diagnostic laboratory. The 9.4 kb band was seen in all 
individuals in keeping with the presence of at least one 
copy of exons 5 and 6 in each of the approximately 900 
individuals tested. In addition, the 3 kb band was 
observed in every individual reflecting a virtually 

25 complete dispersion of some form of the exon 5 through 6 
deleted NAIF gene in the general population. -Moreover, 
the variable band dosage observed for the 3 kb band 
suggested that the number of copies of the exon 5*6 
deleted NAIF gene is polymorphic possibly ranging as high 

30 as 4 or 5 copies per genome. 

PCR analysis was then extended to 110 SMA families, 
employing exon 5 and 6 primers. Seventeen of 38 (45%) 
Type I SMA individuals and 13 of 72 (18%) Type II and III 
SMA individuals were homozygously deleted for these 

35 exons. Assuming random assortment of chromosomes and 
therefore taking the square of the observed frequency of 
homozygous exon 5 through 6 deleted Individuals yields 
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estimated frequencies for exon 5 through 6 deleted 
chromosomes of 67% in Type I SMA and 42% in Type II/III 
SMA, PGR analysis was next conducted on 168 parents of 
SMA children revealed failure of amplification suggesting 
5 homozygous deletion of exon 5 and 6 in three individuals. 
This finding was confirmed by Southern analysis in the 
two cases with sufficient DNA for this assay. The two 
individuals, aged 28 and 35 and both parents of Type I 
SNA children, when interviewed by telephone described 

10 themselves to be physically well, reporting no symptoms 
suggestive of SNA. It was thus concluded that the 
deletion of NAIPs exons 5 through 6 in isolation, while 
possibly reflecting more severe deletions in individuals 
with SMA as outlined below, can be clinically Innocuous 

15 associated either with an exceedingly mild SNA or even 
normal phenotype. Clinical assessment of these 
individuals is currently being undertaken. 

Judging both by the cDNA-~-clones detected from fetal 
brain libraries as well as the make-up of RT-PCR NAIP 

20 products (Figure 2), many and possibly all truncated 

copies of the NAIP gene appear to be transcribed. Given 
the apparently unaffected status of the three parents of 
individuals with SNA who do not have a copy of exons 4 
and 5 in their genome we believe that the exon 5 through 

25 6 deleted version of NAIP is also translated. In keeping 
with this model, removal of exons 5 and 6 results in an 
in-frame deletion that extends the longest NAIP open 
reading frame upstream to a start methionine in exon 3 at 
nucleotide 211 (Table 4) • 

30 Furthermore, the protein sequence encoded by the 

deleted exon 5 and 6 lAP motif is approximately 35% 
homologous to the lAP motif encoded in exons 10 and 11 
possibly accounting for the absence of discernible 
phenotype in the three exon 5 through 6 deleted 

35 individuals. One possible model is that a single copy of 
exon 5 through 6 deleted NAIP on each chromosome results 
in the mild SMA phenotype, while individuals with greater 
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than 3 or 4 copies of the exon 4-*5 deleted NAIP locus are 
clinically unaffected. The possibility that duplication 
of the SHA gene underlies the disease has recently been 
proposed by DiDonato et al. (1994). 
5 RT-PCR amplification of RNA from SMA and non-SMA 

tissue. The results of RT-PCR amplification using RNA 
from both non-SMA and SMA individuals as template are 
shown in Figure 16* 

We have established that at least some of the 

10 internally deleted and truncated NAIP versions are 

transcribed. In order to distinguish between transcripts 
from the intact NAIP gene which would produce a 
functional protein from those that would not, an effort 
was made to RT-PCR amplify transcripts that were as large 

15 as possible. Given the 2.2 kb size of exon 14, this was 
found to be one which encompassed exon 2 and the 5' end 
of exon 13. No product was detected at the level of 
ethidium bromide staining after first round PCR. 
Therefore, second round nested amplification was 

20 undertaken as described in respect of the previous 
description of Figure 16. 

A representative subset of RT-PCR experiments are 
shown in Figure 16. PCR of reverse transcribed product 
using RNA from non-SMA tissues as template and reverse 

25 transcribing from exons 10 or 13 consistently amplified 
product of the expected size. In contrast, similar 
RT-PCR experiments on RNA from SMA tissue revealed no 
amplification in five cases in keeping with the marked 
down regulation or complete absence of the intact 

30 transcript in such individuals (Figure 16A) . The RNA 
obtained from the SMA tissues was no more than 12 hours 
post-mortem. As we have no difficulty in amplifying 
intact NAIP transcript from normal tissue which is 24 hr 
post mortem, we do not believe the difficulty in 

35 amplification arises from RNA degradation. Furthermore, 
difficulty with amplification was seen for all SMA 
tissues which suggests against the possibility that NAIP 
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is transcribed solely in the motor neuron with depletion 
of this cell type in SMA resulting in RT-PCR failure in 
spinal cord tissue. 

In the cases where amplification was observed, 
5 sequencing of RT-PCR products has revealed the following 
findings, as shown in figures 16A, 16B and 16C: 

(i) an in-frame deletion of codons 153 and 190 from 
the 3 'end of exon 5 from sample a9. 

(ii) deletion of exon 6 resulting in a frame shift 
10 with a stop codon occurring 73 nucleotides into exon 7 in 
a product amplified by exon 5 primer 1864 and exon 13 
primer 1974 from sample a2. 

(iii) an approximate 50 nucleotide insertion in a 
product amplified by exon 4 primer 1886 and exon 13 
15 primer 1974 from sample a7* 

(iv) deletion of a glutamic acid codon number 158 in 
exon 5 in association with deletion of exon 11 and 12 in 
a product amplified by exon 5 primer 1864 and exon 13 
primer 1974 from sample a3« 
20 (V) deletion of exons 11 and 12 introducing a frame 

shift and a stop codon 14 nucleotides into exon 13 in a 
product amplified by exon primer 9 primer 1844 and exon 
13 primer 1974 in sample a2, a3, a9 and all. 
In all, employing PGR on material reverse 
25 transcribed from exon 13, we have observed successful 
amplification of the appropriate product from all 12 
non*SMA tissues attempted and in only one of 12 SMA 
tissues. In the latter case, sample al2, amplification 
was from exons 13 to 4 only, whether the transcript also 
30 incorporates exons 2 to 3 or 14 to 17 is unknown. We 

believe that these data provide strong evidence for NAIP 
being the SMA gene. 

RolB of NJilP Protein 
35 The discovery of a neuronal apoptosis inhibitor 

protein gene in the SMA region of chromosome 5 
demonstrates that the SNA condition is a result of 
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deletions in the apoptosis inhibitor protein domains. 
The long time survival of motorneurons is dependent on 
the production of complete neuronal apoptosis inhibitor 
protein. The deletion of the apoptosis inhibitor protein 
5 domain compromises the protein activity. We have 

demonstrated that approximately 70% of all SMA affected 
individuals have deletions of exons 5 and 6 of chromosome 
5. 

The identified region of 5ql3.1 contains a variable 
10 n\imber of copies of intact and partially deleted forms of 
the NAIF gene. While we cannot rule out the presence of 
additional loci in 5ql3.1 that when mutated contribute to 
the SMA phenotype, we believe that mutations of NAIF gene 
are necessary and possibly sufficient for the genesis of 
15 SMA. In contrast to most autosomal recessive diseases 

where causal mutations are usually detected in the single . 
copy of a given gene, we propose that an SHA chromosome 
is characterized by a paucity or, for severe SMA 
mutations, an absence of both the intact NAIF gene as 
20 well as that version which has had exons 3 and 4 deleted. 
The genesis of such chromosomes may involve unequal 
crossovers leaving the chromosome depleted for these loci 
with the resulting absence of the NAIF gene product 
leading to SMA. 

25 

Diagnosis of SMA 

The delineation of an SMA genotype in a given 
individual is complicated by the unusual amplification of 
the NAIF gene in the 5ql3.1 region. Frobings of Southern 

30 blots containing genomic DNA with NAIF exon probes 
invariably reveal bands resulting from copies of 
internally deleted and truncated versions of the NAIF 
gene. The presence of variable numbers of the different 
forms of the NAIF loci in the general population is 

35 therefore the norm and not diagnostic of an SMA mutation 
per se, complicating the mutational analysis of the NAIF 
gene. If the detection of genomic DNA containing altered 
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NAIP loci affords no proof of an SMA chromosome then, by 
default, the search must be for the absence of the normal 
NAIP gene. However, we have detected rare individuals 
with no copies of exons 3-4 in their genome who are 
5 clinically unaffected, an observation that is in keeping 
with what we know of NAIP gene structure. Consequently, 
the identification of an SMA chromosome is contingent on 
the absence of both the intact as well as the exons 3-4 
only deleted forms of NAIP* Assaying for their absence 

10 is complicated by the presence of segments of normal NAIP 
gene in each of the other, more extensively deleted, 
forms of the NAIP locus. One can see, for example, that 
if a given SMA individual had in their genome only the 
deleted versions of NAIP found on PACs 238D12 and 30B2, 

15 that is exons 1-6 deleted and exons 5,6 and 11-14 
deleted, respectively (Figures 10 and 11) in their 
genome, they would appear by PGR and Southern analysis to 
have the exons 5-6 only deleted version of NAIP and 
therefore to have non-SMA chromosomes. We believe that 

20 many and perhaps most of the numerous exon 5-6 deleted 
SMA individuals we have observed actually have 
chromosomes with such a configuration, containing neither 
the intact NAIP loci nor the exons 5-6 only deleted 
version but rather, some other combination of more 

25 severely truncated /deleted versions of the locus with 
resultant absence of intact NAIP translation. Support 
for this interpretation comes from our inability to 
amplify nozmal NAIP transcripts employing RT-PCR on RNA 
from Type I SMA tissue. 

30 In all, the evidence in support of mutations in or 

the absence of the NAIP gene causing SMA includes the 
following: 

(i) The strong possibility that the NAIP, given its 
homology with baculoviral lAPs, functions as an inhibitor 
35 of apoptosis. This characteristic is wholly compatible 
with the pathology of SMA. It is noteworthy that 
mutations in a regulator of apoptosis have been 
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previously suggested as a speculative cause of SMA 
(Oppenheim 1991, Sarnat, 1992). 

(ii) The mapping of the NAIP locus within the 
recombination defined critical SMA interval and the fact 
5 that the three polymorphic markers that have been shown 
to be in strong linkage disequilibrium with type I SMA; 
CATT-40GI (McLean et al., 1994), C272 (Melki et al., 
1994) and A6-1 (DiDonato et al., 1994) all map to PAC 
125D9 and are present on NAIP introns (Figure 9C) . 

10 (iii) The nature of linkage disequilibrium observed 
between the type 1 SMA phenotype and the 5ql3«l markers. 
We have shown that the CATT-4061 CTR sublocus which is 
frequently duplicated on non-SMA chromosomes (Roy et al., 
1994), is deleted in 80% of type 1 SMA chromosomes 

15 compared with 45% of non-SMA chromosomes (McLean et al., 
1994) . This finding is in keeping with a depletion of 
the number of NAIP genes on SMA chromosomes. In a 
similar fashion, Melki et al., 1994, have observed "a 
heterozygote deficiency" consisting of a reduced number 

20 of bands for the C272 CTR in Type I SMA, reflecting, they 
propose, chromosomal deletions. DiDonato et al., (1994) 
have also seen a striking reduction in the number of AGl 
CTR sub-loci in Type I SMA individuals when compared with 
non-SMA individuals. We believe that the observation by 

25 three groups of the depletion of these IntraNAIP markers 
on Type I SNA chromosomes fits well with the proposed 
model of a lack or absence of both the intact and exon 
5-6 deleted form of the NAIP gene underlying the disease. 
(Iv) The markedly increased frequency of NAIP exon 

30 5-6 deletions observed in SMA chromosomes (approximately 
67% of type 1 SMA chromosomes and 42% of type 2/3 SMA 
chromosomes) compared with that detected for non-SMA 
chromosomes (2-3%) . As outlined above, we believe that 
this phenomenon reflects the rarity or absence of both 

35 the intact NAIP gene as well as the NAIP version with 
only exons 5 through 6 deleted in the SNA chromosomes, 
leaving only the more significantly internally deleted 
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and truncated forms of the NAIP gene present. 

(V) Our consistent inability to RT-PCR amplify 
appropriate size transcripts from RNA obtained from 11 of 
12 SMA individuals despite success with 12 of 12 RNAs 
5 from non-SMA individuals. Furthermore, sequencing of 
those RT-*PCR products that could be obtained from type 1 
SHA material revealed a variety of mutations and 
deletions. 

(vi) The presence of a variable number of copies of 
10 truncated and internally deleted versions of the NAIP 
gene is similar to the situation reported in the 
autosomal dominant polycystic kidney disease gene (ADPKD, 
European Polycystic Kidney Disease Consortium, 1994) . In 
this case portions of unprocessed pseudogenes 
15 corresponding to the causative gene were found to map 

elsewhere on chromosome 16p. The key difference, is that 
with the NAIP locus the mutated form of the gene is 
amplified. 

In this regard the NAIP region of 5ql3.1 has more 

20 similarity to the area of chromosome 6 containing CYP21, 
the gene that encodes steroid 21-hydroxylase (Wedell and 
Luthman, 1993) . CYP21, which when mutated causes an 
autosomal recessive 21-hydroxylase deficiency, has been 
observed in 0**3 copies In individuals. There also exists 

25 in the region a variable number of inactive pseudogene 
copies of CYP21 known collectively as CYP21P. The 
majority of the CYP21 mutations that have been observed 
in 21 -hydroxylase deficiency can also be found in some 
form of CYP21P and it is thought that the pseudogenes act 

30 as a source of the mutations observed in CYP21. The 

truncated and internally deleted NAIP genes are analogous 
to CYP21P only instead of the gene conversion postulated 
for CYP21/CYP21P it is possible that unecpial crossing 
over results in chromosomes deleted for forms of the NAIP 

35 gene that encode functional protein. The existence of a 
polymorphic number of mutated NAIP genes on 5ql3.1 is a 
credible mechanism for generation of SHA chromosomes in 
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this fashion. 

Baculoviral IJiPs 

NAIP shows significant homology with the two 
5 baculoviral gene products, CpIAP and OpIAP, that are 
capable of inhibiting insect cell apoptosis (Table 4) . 
Insect cell apoptosis following baculoviral infection has 
been well docxunented and is postulated to be a defence 
mechanism. Premature death of infected insect cells 

10 result in an attenuation of viral replication (Clem and 
Miller, 1994a) • CplAP and OpIAP are thought to represent 
baculoviral responses to this apoptotic mechanism. Both 
act independently of other viral proteins to inhibit host 
insect cell apoptosis, thereby permitting increased viral 

15 proliferation (Clem and Miller, 1994a, 1994b) . They are 
known to be strongly similar only to each other; until 
now no sequences similarities with cross phyla proteins 
have been reported. Their mode of action is unknown, 
although some interaction with DMA has been postulated. 

20 The role and cellular localization of NAIP has not 

yet been established. However, we believe that the 
significant sequence similarity between NAIP and the 
baculoviral lAPs, especially over such a considerable 
phylogenic distance, combined with the previously 

25 postulated role of inappropriate apoptosis in the 

pathogenesis of SMA make it likely that NAIP serves as an 
apoptosis inhibitor in the motor neuron. Transfection 
assays employing NAIP both in insect and mammalian 
neuronal cells will help in this regard. 

30 One possibility is that specific ligand binding of 

- the carboxy terminus of the NAIP activates the GTP 

binding site which in turn activates the lAP domain. The 
survival of a motor neuron might, therefore, be dependent 
on the presence of the ligand(s): should the' 

35 concentration drop below a critical threshold, the lAP 
domains cease to function with ensuing cell death. This 
represents a possible mechanism for the natural winnowing 
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of motor neurons observed in embryogenesis. The source 
of the ligand might be postulated to be either muscle 
cells or Schwann cells. The embryogenesis of motor 
neurons might, therefore, be viewed as a competition 
5 between the cells with only those that make sufficient 
contacts to maintain the NAIP occupancy rate surviving. 

If, as postulated, NAIP does inhibit apoptosis, it 
is unclear whether NAIP is a constituent of a previously 
uncharacterized mammalian apoptotic pathway or a 

10 (presumably) upstream component of the pathway involving 
the human Inhibitor of apoptosis, Bcl*2 (Vaux et al., 
1988; Hockenberry et al., 1990; Garcia et al., 1992). 
Assays employing apoptosis inhibition deficient 
baculoviral strains have revealed that Bcl*2 does not 

15 complement the deficiency in such assays (Clem and 

Miller, 1994b). If NAIP is a functional homolog of the 
baculoviral lAPs, then this observation might suggest a 
role in a previously ^characterized eucaryotic apoptotic 
pathway. One possibility is that NAIP represents an 

20 intersection of a novel apoptotic mechanism with the 

neurotrophic cytokine, ciliary neurotrophic factor (CNTF, 
Raff et al,, 1993; Meakin and Shooter, 1993) or one of 
the downstream components of this pathway (Stahl et al., 
1994). CNTF null mice show a pathologic picture that is 

25 similar to that of SMA with normal development of the 

neurons initially followed by their progressive apoptotic 
depletion (Masu et al., 1993). Moreover, although 
deprivation of neurotrophins under the right conditions 
may result in apoptosis of cultured neurons, it is 

30 noteworthy that CNTF is alone among these agents in not 
having such apoptosis rescued by Bcl*2. This finding led 
the workers who made the observation to suggest the 
presence of a second eucaryotic apoptotic pathway 
(Allsopp et al., 1993). The existence of such distinct 

35 pathways may underlie the synergistic effect observed 
with the marked retardation of motor neuron loss in the 
wobbler mouse mutant following treatment with brain 
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derived neurotrophic factor (BDNF) and CNTF (Mitsumoto et 
al., 1994). 

The role of the lipid attachment site in NAIP is 
unknown. Similar sites have been known to serve, as 
5 procaryotic protein leader sequences usually situated in 
the protein's amino terminus. We have detected the 
consensus pattern in 218 human sequences in the 
Swiss-Protein Database (release 28). These sequences are 
present in a variety of functional settings; 

10 transmembrane regions, signal sequences, extracellular 
and cytoplasmic domains. One possibility is that the 
lipoprotein attachment site is extracellular and binds a 
constituent of the Schwann cell proteolipid in a manner 
that has been postulated for the apoptosis inhibiting 

15 interaction of i^tegrin with the extracellular matrix 
(Meredith et al., 1993; Frisch and Francis, 1994). 
Furthermore, the site may play a more active role in the 
hepatic form of the NAIP that-ve have observed on 
Northern blot analysis. It Is noteworthy that serum 

20 fatty acid abnormalities have been detected In children 
with SMA (Kelley and Sladky, 1986) . 

The identified region of 5ql3.1 contains, in 
addition to the NAIP gene, a variable number of copies of 
internally deleted and truncated forms of the gene. We 

25 believe that a lack or absence of both the Intact NAIP 
gene and the NAIP locus with exons 5 and 6 deleted from a 
given individual's genome are likely to cause SNA. In 
this regard, the identification of NAIP has allowed us to 
develop accurate molecular based diagnoses of SHA as well 

30 as directing the formulation of conventional and genetic 
therapies for these debilitating conditions. 
Furthermore, the identification of genes showing homology 
with the NAIPlocus and proteins that interact with NAIP 
may help in the continuing elucidation of apoptotic 

35 mechanisms in mammalian cells. 
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EXAMPLES 
Family Material 

Clinical diagnoses conducted as described in 
MacKenzie et al. (1993) with all patients fulfilling the 
5 diagnostic criteria given therein. DNA was isolated from 
peripheral leukocytes as described (MacKenzie et al., 
1993) . 

Genetic and Linkage Disequilibrium Analyses 

10 Genotyping with microsattelite markers was as 

outlined in MacKenzie et al. (1993) and McLean et al. 
(1994). The following 5ql3.1 loci were used as 
described: D5S112 (Brzustowitcz et al., 1990), D5S351 
(Hudson et al., 1992), D5S435 (Scares et al. , 1993), 

15 D5S557 (Francis et al., 1993), D5S629 and D5S637 

(Clermont et al., 1994), D5S684 (Brahe et al, 1994), 
Y98T, y97T, Y116T, yi22T and CMS (Kleyn et al. , 1993), 
CATT (Burghes et al., 1994, McLean et al., 1994) and 
MAPIB (Lien et al., 1991). 

20 Linkage diseG[uilibrium analyses were conducted using 

parameters that can accommodate the multiple alleles seen 
with mlcrosatellite repeats. Given the complexities 
inherent in disequilibrium analyses, a total of 4 
different parameters for which multiple alleles may be 

25 used were employed. These were Dij, Dij^ and as 

defined in Hedrick (1987) as well as the chi square test. 
Two of these, Dij and Dij' have given the best a 
posteriori positional information in a previous study on 
myotonic dystrophy (Podolsky et al., 1994). The patient 

30 and control population is as outlined in McLean et al. 
(1994). 

Cosmid, YAC and PAC Arraying 

Cosmid and YAC contig assembly was as outlined in 
35 Roy et al. (1994). PACs were constructed as outlined in 
loannou et al. (1994). Using these procedures three PAC 
libraries have been constructed with a combined total of 
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microtiter dishes (loannou et al*, unpublished results). 
Pools derived from the three libraries (designated LLNL 
PACl; RPCIl and RPCI2) were screened with 5ql3.1 STS's. 
5 Positive PACs were arranged into a contiguous and 

overlapping arrays by further analysis with additional 
STSs combined with probings of Southern blots containing 
PAC DNA by single copy genomic DNA and cDNA probes. 

10 DNA Manipalation and Analysis 

Four genomic libraries containing PAC 125D9 insert 
were constructed by BamHI, BamHI/NotI, total and partial 
Sau3al (selected for 5kb insert size) digestions of the 
PAC genomic DNA insert and subcloned into Bluescript 

15 vector. Sequencing of approximately 400 bp of both 
termini of 200 five kb clones from the partial Sau3AI 
digestion library in the manner of Chen et al. (1993) was 
undertaken. 

Coding sequences from the PACs were isolated by the 
20 exon amplification procedure as described by Church et 
al. (1994). PACs were digested with BamHI or BamHI and 
Bglll and subcloned into pSPL3. Pooled clones of each 
PAC were transfected into COS-1 cells. After a 24h 
transfection total RNA was extracted. Exons were cloned 
25 into pAMPlO (Gibco, BRL) and sequenced utilizing primer 
SD2 (GTG AAC TGC ACT GTG ACA. AGC T6C) . 

DNA sequencing was conducted on an ABI 373A 
automated DNA sequencer. Two commercial human fetal 
brain cDNA libraries in lambda gt (Stratagene) and lambda 
30 ZAP (Clontech) were used for candidate transcript 

isolation. The Northern blot was commercially acquired 
(Clontech) and probing was performed using standard 
methodology. 

In general, primers used in the paper for PCR were 
35 selected for T„s of 60®C and can be used with the 

following conditions: 30 cycles of 94^C, 60s; 60^C, 60s; 
72 «C, 90s. PCR primer mappings are as referred to in the 
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figure legends and text. Primer sequences are as 
follows: 

1258 ATg err ggA TCT CTA gAA Tgg - Sequence ID No. 3 
5 1285 AgC AAA gAC ATg Tgg Cgg AA - Sequence ID No. 4 

1343 CCA gCT CCT AgA gAA AgA Agg A - Sequence ID No. 5 

1844 gAA CTA Cgg CTg gAC TCT TIT - Sequence ID No. 6 

1863 CTC TCA gCC TgC TCT TCA gAT - Sequence ID No. 7 

1864 AAA gCC TCT gAC gAg Agg ATC - Sequence ID No. 8 
10 1884 CgA CTg CCT gTT CAT CTA CgA - Sequence ID No. 9 

1886 TTT gTT CTC CAg CCA CAT ACT - Sequence ID No. 10 

1887 CAT TTg gCA TgT TCC TTC CAA g - Sequence ID No. 11 
1893 gTA gAT gAA TAC TgA TgT TTC ATA ATT - Sequence ID No. 
12 

15 1910 TgC CAC TgC CAg gCA ATC TAA - Sequence ID No. 13 

1919 TAA AC A ggA CAC ggT AC A gTg - Sequence ID No. 14 

1923 CAT gTT TTA AgT CTC ggT gCT CTg - Sequence ID No. 15 

1926 TTA gCC AgA TgT gTT ggC ACA Tg - Sequence ID No. 16 

1927 gAT TCT ATg TgA TAg gCA gCC A - Sequence ID No. 17. 
20 1933 gCC ACT gCT CCC gAT ggA TTA - Sequence ID No. 18 

1974 gCT CTC AgC TgC TCA TTC AgA T - Sequence ID No. 19 

1979 ACA AAg TTC ACC ACg gCT CTg - Sequence ID No. 20 



RT-PCR 

25 cDNA was synthesized in a 20 til reaction utilizing 7 

Mg of total RNA. The RNA was denatured for 5 minutes at 
95^C and cooled to 37 Reverse transcription was 
performed at 42 ®C for 1 hour after addition of 5/il 5X 
reverse transtriction buffer, 2/il 0.1 M DTT, 41 2.5 mM 

30 dKTPs, 8 units RNasin, 25 ng cDNA primer (1285) and 400 
units of MMLV (Gibco, BRL) . 1 Ml of cDNA was utilized as 
template in subsequent 50m1 PGR reactions. 1 of this 
primary PGR was utilized as template for secondary PGR 
amplifications. 
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Sequence Analysis 

Primary DNA sequence data was edited with the TED 
program (Gleeson and Hillier, 1991) • As many of the 
partially sequenced 200 five kb clones from the partial 
5 Sau3AI digestion library as possible were arranged into 
overlapping arrays using the XBAP Staden package (Dear 
and Staden, 1991) . Sequence data was also assembled and 
analyzed using the GCG Sequence analysis (Genetics 
computer group, 1991) . Protein domain homologies were 
10 found by searching the Prosite Protein database (Bairoch 
and Bucher, 1993). The HEHSAT program was also used to 
search for transmemJ:>rane domain regions (Jones et al., 
1994) . 
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TABLE 1 

The YACs isolated in this study, their size and library of origin are listed. NCE: 
National Centers of Excellence, Toronto, Ontario, Canada. ICRF: Imperial Cancer 
Research Fund, CEPH: Centre d'Etude du Polymorphisme Humaine. 



YAC 

12H1 

12H4 

24D6 

27H5 

33H10 

H0416 

E0320 

G1138 

A0848 

D06100 

D0981 

919C2 

755B12 

754H5 



SIZE 

S60kb 
270kb 
750kb 
630kb 
1.3Mb 
390kb 
440kb 
8S0kb 
3S0kb 
SSOkb 
4S0kb 
800kb 
1Mb 
SOOkb 



LIBRARY 

NCE 
NCE 
NCE 
NCE 
NCE 
ICRF 
ICRF 
ICRF 
ICRF 
ICRF 
ICRF 
CEPH 
CEPH 
CEPH 
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TABLE 2 



1 PROBE 


SOURCE/ 


PROBE 


SOURCE/ 


YD33 


STS developed ftom Alu- 
5-trpPCRproduaof 
YAC D06100 


Y13.1 


STS developed 
from inter 
PCR product of 
YAC 12Hl(tDiS 
study) 


Y14.1 


STS developed from Alu- 
5 -lira PCR produa of 
YAC 12H4 (this study) 


Y15.1 


STS developed 
firomi4/ii-5'-ura 
PCR product of 
YAC 12H4 (this 
study) 


Y9.2 


STS developed from inter- 
Alu-5' PCR product of 
YAC 27H5 (this study) 


Y5.6 


STS developed fron 
mterr/4/u-j PCR 
product of YAC 
24D6 (this study) 


Y11.2 


STS devdoped from Alu- 
J -trp PCR product of 
YAC 33H10 (tfiis study) 


pZY8 


subcioned 1.3 kb 
Hindni fragment 
from cosmid 2S0B6 
(this smdy) 


H7T733 


Alu 33-T7 PCR product 
from cosmid 1H7 (this 
study) 


plSl.2 


subcioned 1.2 kb 
inter-i4/2i PCR 
product of cosmid 
iDro (tnis study) 


G10T333 


Alu 33-T3 PCR product of 
cosmid IGIO (this study) 


p402.1 


subcioned 2.1 kb 
Bam fOIHindm 
rragroeni ox cosmia 
40GI (this study) 


G3T733 

• 


Alu 33-T7 PCR product of 
cosmid IG3 (this study) 




itver uaiiscripi 
isolated with 
subcioned 1.1 kb 
BaniHllSaa 
fragment from 
58012 (this study) 


p2281.8 


subcioned 1.8 kb Hindtil 
fragment of cosmid 228C8 
(this study) 


F933 


inter-i4/tt PCR 

product of cosmid 
1F9 (this study) 


pGAl 


fetal brain transcript 
isolated with cosmid 2S0B6 


0- 

glucuronidase 


(Oshima et al. 
1987) 


MAPIB 


(Lien et al. 1991) 


Y122T 


(Kleyn et al., 1993) 


DSS3S1 


(Yaraghi et al., in press) 


CMS-1 


(Kleyn et al., 1993) 
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PROBE 


SOURCE/ 

REFERENCE 


PROBE 


SOURCE/ 

REFERENCE 


D5S557 


(Francis et al., 1993) 


Y98T 


(Kleyn et al.. 1993) 


DSS112 


(Brzustowitcz et al.. 1990) 


Y97T 


(Kleynetal., 1993) 


VI 1*51! 




V8RT 

X OO 1 




VI lOT 

I iiyi 


(.ivieyn el ai, lyyo) 


V 1 1 T 




CATT-l 


(Buigbes et al., 1994; 
McLean et al., in press) 


Y55U 


(Kleyn et al.. 1993) 


D5S127 


(Shoringlon et al., 1991) 


Y38T 


(Kleynetal.. 1993) 


DSS43S 


(Scares et al., 1993) 


D5S12S 


(Hudson et al., 
1992) 


Y107U 


(Kleynetal., 1993) 


Y97U 


(Kleynetal.. 1993) 


DSF149 (C212) 


(Melkietal., 1994) 


D5F151 
(C171) 


(Mdki et al., 1994) 


D5F150 (C272) 


(Mdki et al., 1994) 


D5F153 
(C161) 


(Mdki et al.. 1994) 


DSS637 


(Qermont et al., 1994) 


DSS629 


(Qennont et al., 
1994) 
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Tables 

The homology of the GAl component of neuronal apoptosis inhibitor protein gene compared for 
homology with the inhibitor apoptosis polypeptides of the viruses Cydia pomenella and Orgyia 
pseudotsugata. 

1 50 

Cydia pomonella 

Orgyia paeudots 

cGAl- concensus TRTVDKPQia4 ATQQXA5DER ISQFDHNLLP ELSALLGLDA VQLAKELEEE 

51 100 

Cydia pomonella 

Orgyia pseudots _ 

CGAI- concensus EQKERAKMQK GYNSQMRSEA KRLKTFVTYE FYSSWIPQEM AAAfSFYFTSV 

101 150 

Cydia pomonella 

Orgyia pseudots HS 

cGAl-concensus KSGIQCFCCS LILFGAGLTR LPIEOKKRPK PDCGFLLNKD VGNIAKYDIR 

151 200 

Cydia pomonella H SDLR. .LEEV RLKTFEKWP. .VSFLSPETM AXNGFyVLGR 

Orgyia pseudots SRAIGAPQE6 ADMK. .KKAA RLGTYTNWP. .VQFLEPSRM AASGFYYIiGR 

cGAl-concensuS VX»LKSRLRG GKKRYQEEEA RIASFRHWPF YVQGISPCVL SEAGFVFT6K 

201 250 

Cydia pomonella SDEVRCAFCK VEIMRHXEGE DPAADHKKWA PQCPFVKGID VCG5I 

orgyia pseudots GDEVRCAFCK VEITNWVRGD DPETDHKRWA PQCPFVRK 

CGAI- concensus ODTVQCFSCG GCIiGNVfEEGD DPWKEHAKWF PKCEFLRSKK SSEEXTQYIQ 

251 300 

Cydia pomonella VTT NNIQNTTTHD TIIGPA HPKYAHEAAR VKSFHNWPRC 

Orgyia pseudots NA HDTPHDRAPP ARSAAA HPQYATEAAR LRTFAEWPRG 

CGAI- concensus SYKGFVDZTG EHFVNSWVQR ELPHA5AYCH DSZFAYEELR LDSFKDHPRE 

301 ' 350 

Cydia pomonella MKQRPEOMAD AGFFYTGYGD NTKCFYCDGG LKDWEPEDVP WEQHVRWFDR 

Orgyia pseudots LKQRPEELAE AGFFYTGQGD KTRCFCCDGG LKDWEPDDAP WQQHARWYDR 

cGAl- concensus SAVGVAALAK AGLFYTGIKD IVQCPSCGGC LEKWQEGDDP LDDHTRCFPK 

351 -~~ 400 

Cydia pomonella CAYVQLVKGR DYVQKVI , . . TEACVLPGEN TTVSTAAPVS EPIPETKIEK 

Orgyia pseudots ceyvllvkgr dfvqrvm. . , teacwrdad n ephier 

CGAI -concensus CPFLQNMKSS. AEVTPDLQSR GELCELLETT SESNLEDSIA V6PIVPEMAQ 

401 450 

Cydia pomonella EPQ VEDSKLCKIC YVEE CIV CFVPCGHWA 

Orgyia pseudots PAV EAE VADDRLCKIC LGAE KTV CFVPCGHWA 

cGAl-concensus GEAQHFQCAK m.NEQLRAAY TSASFRKNSL LDZSSDLATD HLLGCDLSIA 

451 500 
Cydia pomonella CAKCALSVDK CPMCRKIVTS VLKVYFS 

Orgyia pseudots cgkcaagvtt cpvcrqqldk avrmyqv... 

cGAl- concensus skhiskpvqe plvlpbvfgn lmsvmcvege agsgktvllk kiaflwasgc 
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TABLE 3 (continued) 



cGAl- 
CGAI- 
CGAI" 
cGAl- 
CGAI- 
CGAI- 
cGAl- 
cGAl- 
cGAl- 
cGAl- 
CGAI- 
CGAI- 
cGAl- 
CGAI- 



concensus 
concensus 
concensus 
concensus 
concensus 
concensus 
concensus 
concensus 
concensus 
concensus 
concensus 
concensus 
concensus 
concensus 



501 

CPLLNRFQLV 
LKNOVLFLLD 
YLETILEIQA 
PLFVAAICAH 
C6ELALKGFF 
FLSPAFQEFL 
LNYVSSLPST 
QLLRGLWQZC 
TLTLGAWILQ 
QVPTIDODYA 
LSFKQYKIPC 
ESIRPALELS 
DQIFFNLDKF 
S 



FYLSLSSTRP 
DYKEICSIPQ 
FPFYNTVCIL 
WFOYPFDPSF 
SCCFEFNDDO 
A6MRLIELLD 
KAGPKIVSHL 
POAYFSMVSE 
YFFDHPESLS 
SAFEPHNEWE 
LBVDVNDIDV 
KASVTKCSIS 
LCLKELSVDL 



DEGLASIICD 
VXGKLZQKNH 
RKLFSHNMTR 
DDVAVFXSYM 
LAEAGVDEDE 
SDRQEHODLG 
LKLVDNKBSL 
HLLVLALKTA 
LLRSIHFSIR 
RNLAEKEDNV 
VGQDMLEILM 
KLELSAAEOE 
EGNINVFSVZ 



QLLEKEGSVT 
LSRTCLLIAV 
LRKFMVYFGK 
ERLSLRNXAT 
DLTMCLMSKF 
LYHLKQINSP 
ENISENDDYL 
YQSNTVAACS 
GNKTSPRAKF 
K5YMDM0RRA 
TVFSASQRIE 
LLLTLPSLES 
PEEFPNFHHM 



550 
EMCMRNIIQQ 
RTNRARDIRR 
NQSLQKIQKT 
AEILKATVSS 
TAQRLRPFYR 
MMTVSAYNNF 
KHQPEISLQM 
PFVLQFUJGR 
SVLETCFDKS 
SPDLSTGYWK 
LKLNHSRGFI 
LEVSGTIQSQ 
EKLLIQISAE 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 
(i) APPLICANT: 



(A 
(B 
(C 
(D 
(E 
(F 
{0 
(H 

(A 
<B 
<C 
<E 
<F 

<A 
<B 
(C 
(D 
(E 
(F 

(A 
(B 
(C 
(D 
(E 
(F 

(A 
(B 
(C 
(D 
(E 
(F 

(A 
<B 
(C 
(D 
<E 
(F 

(A 
(B 
(C 
(D 
(E 
(F 

(A 
(B 
(C 
(E 
(F 



NAME: UNIVERSITY OF OTTAWA 
STREET: 550 Cumberland Street 
CITY: Ottawa, 
STATE: Ontario 
COUNTRY: Canada 
POSTAL CODE (ZIP): KIN 6N5 
TELEPHONE: 613-564-5804 
TELEFAX: 613-564-5952 

NAME: RESEARCH DEVELOPMENT CORPORATION OF JAPAN 

STREET: 4-1-8, Honcho, Kawaguchi-ahi 

CITY: Saitama 332 

COUNTRY: Japan 

POSTAL CODE (ZIP): none 

NAME: MacRENZIE, Alex E. 

STREET: 35 Rockliffe Way 

CITY: Ottawa 

STATE: Ontario 

COUNTRY: Canada 

POSTAL CODE (ZIP): KIN 1A3 

NAME: KORNELUK, Robert 6. 

STREET: 1901 Tweed Avenue 

CITY: Ottawa 

STATE: Ontario 

COUNTRY: Canada 

POSTAL CODE (ZIP): KlG 2L8 

NAME: HAHADEVAN, Hani S. 

STREET: 818 South Grammon Road, Apt. 4 

CITY: Madieon 

STATE: Wisconein 

COUNTRY: United Statea of America 
POSTAL CODE (ZIP): 53719 

NAME: MCLEAN r Michael 

STREET: 1 Halesmanor Crt. 

CITY: Guelph 

STATE: Ontario 

COUNTRY: Canada 

POSTAL CODE (ZIP): NIG 4E1 

NAME: ROY, Natalie 

STREET: 6 McLeod Street, 

CITY: Ottawa 

STATE: Ontario 

COUNTRY: Canada 

POSTAL CODE (ZIP): K2P 0Z5 

NAME: IKEDA, Joh-E. 

STREET: 31-1 Kamineguro 5-chome, Meguro-ku, 
CITY: Tokyo 
COUNTRY: Japan 
POSTAL CODE: 153 
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(ii) TITLE OF INVENTION: NEURONAL APOPTOSIS INHIBITOR PROTEIN, 
GENE SEQUENCE AND MUTATIONS CAUSATIVE OF SPINAL MUSCULAR ATROPHY 

(ill) NUMBER OF SEQUENCES: 20 

(iv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 <EPO) 



(2) INFORMATION FOR SEQ ID NO: 1: 

<i) SEQUENCE CHARACTERISTICS: 

<A) LENGTH: 5502 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESSs single 

(D) TOPOLOGY I linear 

(ii) MOLECULE TYPE: CONA 
(iii) HYPOTHETICAL! NO 
(iv) ANTI-SENSE t NO 

(viii) POSITION IN GENOME: 
(C) UNITS: bp 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 



TTCCGGCTG6 


ACGTT6CCCT 


GTGTACCTCT 


TCGACTGCCT 


GTTCATCTAC 


GACGAACCCC 


60 


GGGTATTGAC 


CCCAGACAAC 


AATGCCACTT 


CATATTGCAT 


GAAGACAAAA 


GGTCCTGTGC 


120 


TCACCTGGGA 


CCCTTCTGGA 


CGTT6CCCTG 


TGTTCCTCTT 


CGCCTGCCTG 


TTCATCTACG 


180 


AC6AACCCC6 


GGTATTGACC 


CCAGACAACA 


ATGCCACTTC 


ATATTGGGGA 


CTTCGTCTGG 


240 


6ATTCCAAGG 


TGCATTCATT 


GCAAAGTTCC 


TTAAATATTT 


TCTCACTGCT 


TCCTACTAAA 


300 


GGACGGACA6 


AGCATTTGTT 


CTTCAGCCAC 


ATACTTTCCT 


TCCACTGGCC 


AGCATTCTCC 


360 


TCTATTAGAC 


TAGAACTGTG 


GATAAACCTC 


AGAAAATGGC 


CACCCAGCAG 


AAAGCCTCTG 


420 


ACGAGAGGAT 


CTCCCAGTTT 


GATCACAATT 


TGCTGCCA6A 


GCTGTCTGCT 


CTTCTGGGCC 


480 


TAGAT6CAGT 


TCAGTTGGCA 


AAG6AACTAG 


AAGAAGAGGA 


GCAGAAGGAG 


CGAGCAAAAA 


540 


TGCAGAAAGG 


CTACAACTCT 


CAAAT6CGCA 


GTGAAGCAAA 


AAGGTTAAAG 


ACTTTT6T6A 


600 


CTTATGAGCC 


GTACAGCTCA 


TGGATACCAC 


J^GGAGATGGC 


GGCCGCTGGG 


TTTTACTTCA 


660 


CTGGGGTAAA 


ATCTGGGATT 


CAGTGCTTCT 


GCTGTAGCCT 


AATCCTCTTT 


GGTGCCGGCC 


720 


TCACGAGACT 


CCCCATAGAA 


GACCACAAGA 


GGTTTCATCC 


AGATTGTGGG 


TTCCTTTTGA 


780 


ACAAGGATGT 


TGGTAACATT 


GCCAAGTACG 


ACATAAGGGT 


GAAGAATCTG 


AAGAGCAGGC 


840 


TGAGAGGAGG 


TAAAATGAGG 


TACCAAOAAG 


AGGAGGCTAG 


ACTTGCATCC 


TTCAGGAACT 


900 


GGCCATTTTA 


TGTCCAAGGG 


ATATCCCCTT 


GTGTGCTCTC 


AGAGGCTGGC 


TTTGTCTTTA 


960 
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CAGGTAAACA 


GGACACGGTA 


CAGTGTTTTT 


CCTGTGGTGG 


ATGTTTAGGA 


AATTGGGAAG 


1020 


AAGGAGATGA 


TCCTTGGAAG 


GAACATGCCA 


AATGGTTCCC 


CAAATGTGAA 


TTTCTTC6GA 


1080 


GTAAGAAATC 


CTCAGAGGAA 


ATTACCCAGT 


ATATTCAAAG 


CTACAAGGGA 


TTTGTTGACA 


1140 


TAACGGGAGA 


ACATTTTGTG 


AATTCCT6GG 


TCCAGAGAGA 


ATTACCTATG 


GCATCAGCTT 


1200 


ATTGCAATGA 


CAGCATCTTT 


GCTTACGAAG 


AACTACGGCT 


GGACTCTTTT 


AAGGACTGGC 


1260 


CCC66GAATC 


A6CTGTGGGA 


GTTGCAGCAC 


TGGCCAAAGC 


AGGTCTTTTC 


TACACAGGTA 


1320 


TAAAGGACAT 


CGTCCA6TGC 


TTTTCCT6TG 


GAGGGTGTTT 


AGAGAAATGG 


CAGGAAGGTG 


1380 


ATGACCCATT 


AGAC6ATCAC 


ACCAGATGTT 


TTCCCAATTG 


TCCA7TTCTC 


CAAAATATGA 


1440 


A6TCCTCTGC 


GGAAGTGACT 


CGAGACCTTC 


AGAGCOGTGG 


TGAACTTTGT 


GAATTACTGG 


1500 


AAACCACAA6 


TGAAAGCAAT 


CTTGAAGATT 


CAATAGCAGT 


TGGTCCTATA 


GTGCCAGAAA 


1560 


T6GCACAGGG 


TGAAGCCCAG 


TGGTTTCAAG 


AGGCAAAGAA 


TCTGAAT6AG 


CAGCTGAGAG 


1620 


CAGCTTATAC 


CAGCGCCAGT 


TTCC6CCACA 


TGTCTTTGCT 


TGATATCTCT 


TCCGATCTGG 


1680 


CCACGGACCA 


CTT6CT66GC 


T6TGATCTGT 


CTATTGCTTC 


AAAACACATC 


AGCAAACCTG 


1740 


TGCAA6AACC 


TCT6GTGCTG 


CCTGAGGTCT 


TTGGCAACTT 


6AACTCTGTC 


ATGTGTGTGG 


1800 


AGGGTGAAGC 


TGGAAGTGGA 


AAGACGGTCC 


TCCTGAAGAA 


AATAGCTTTT 


CT6TGGGCAT 


1860 


CTGGATGCTG 


TCCCCTGTTA 


AACAGGTTCC 


AGCTGGTTTT 


CTACCTCTCC 


CTTAGTTCCA 


1920 


CCAGACCAGA 


CGAGGGGCTG 


GCCAGTATCA 


TCTGTGACCA 


GCTCCTA6A6 


AAAGAAGGAT 


1960 


CT6TTACTGA 


AAT6TGCATG 


AGGAACATTA 


TCCAGCAGTT 


AAAGAATCAG 


6TCTTATTCC 


2040 


TTTTAGATGA 


CTACAAAGAA 


ATATGTTCAA 


TCCCTCAAGT 


CATAGGAAAA 


CTGATTCAAA 


2100 


AAAACCACTT 


ATCCCGGACC 


TGCCTATTGA 


TTGCTGTCCG 


TACAAACAGG 


GCCAGGGACA 


2160 


TCCGCCGATA 


CCTAGAGACC 


ATTCTAGAGA 


TCCAAGCATT 


TCCCTTTTAT 


AATACTGTCT 


2220 


GTATATTACG 


GAAGCTCTTT 


TCACATAATA 


TGACTCGTCT 


GCGAAAGTTT 


ATGGTTTACT 


2260 


TTGGAAAGAA 


CCAAAGTTTG 


CAGAAGATAC 


AGAAAACTCC 


TCTCTTTGTG 


GCGGCGATCT 


2340 


GTGCTCATTG 


GTTTCAGTAT 


CCTTTTGACC 


CATCCTTTGA 


TGATGTGGCT 


GTTTTCAAGT 


2400 


CCTATATGGA 


ACGCCTTTCC 


TTAAGGAACA 


AAGCGACAGC 


TGAAATTCTC 


AAAGCAAC7G 


2460 


TGTCCTCCTG 


TGGTGAGCTG 


GCCTTGAAAG 


GGTTTTTTTC 


ATGTTGCTTT 


GAGTTTAATG 


2520 


ATGATGATCT 


CGCAGAAGCA 


GGGGTTGATG 


AAGATGAAGA 


TCTAACCATG 


TGCTT6ATGA 


2580 


— GCAAATTTAC 


AGCCCAGAGA 


CTAAGACCAT 


TCTACCGGTT 


TTTAAGTCCT 


GCCTTCCAAG 


2640 


AATTTCTTGC 


GGGGATGAGG 


CTGATTGAAC 


TCCTGGATTC 


AGATAGGCAG 


GAACATCAAG 


2700 


ATTTGGGACT 


GTATCATTTC 


AAACAAATCA 


ACTCACCCAT 


GATGACTGTA 


AGCGCCTACA 


2760 


ACAATTTTTT 


GAACTATGTC 


TCCAGCCTCC 


CTTCAACAAA 


AGCAGGGCCC 


AAAATTGTGT 


2820 


CTCATTTGCT 


CCATTTAGTG 


GATAACAAAG 


AGTCATTGGA 


GAATATATCT 


GAAAATGATG 


2860 


ACTACTTAAA 


GCACCAGCCA 


GAAATTTCAC 


TGCAGATGCA 


GTTACTTAGG 


GGATTGTGGC 


2940 
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AAATTTGTCC 


ACAAGCTTAC 


TTTTCAATGG 


TTTCAGAACA 


TTTACTGGTT 


CTTGCCCTGA 


3000 


AAACTGCTTA 


TCAAAGCAAC 


ACTGTTGCTG 


CGTGTTCTCC 


ATTTGTTTTG 


CAATTCCTTC 


3060 


AAGGGAGAAC 


ACTGACTTTG 


GGT6CGCTTA 


ACTTACAGTA 


CTTTTTCGAC 


CACCCAGAAA 


3120 


GCTTGTCATT 


GTTGAGGAGC 


ATCCACTTCT 


CAATACGAGG 


AAATAAGACA TCACCCAGAG 


3180 


CACATTTTTC 


AGTTCTGGAA 


ACATGTTTTG 


ACAAATCACA 


G6TGCCAACT 


ATAGATCAGG 


3240 


ACTATGCTTC 


TGCCTTTGAA 


CCTATGAA7G 


AATGGGAGCG 


AAATTTA6CT 


GAAAAAGAGG 


3300 


ATAATGTAAA 


GAGCTATATG 


GATATGCAGC 


GCAGGGCATC 


ACCAGACCTT 


AGTACTGGCT 


3360 


ATTGGAAACT 


TTCTCCAAAG 


CAGTACAAGA 


TTCCCTGTCT 


AGAA6TCGAT 


GTGAATGATA 


3420 


TTGATGTTGT 


AGGCCAGGAT 


ATGCTTGAGA 


TTCTAATGAC 


AGTTTTCTCA 


GCTTCACAGC 


3480 


GCATCGAACT 


CCATTTAAAC 


CACAGCAGAG 


GCTTTATAGA 


AAGCATCCGC 


CCAGCTCTTG 


3540 


AGCTGTCTAA 


GGCCTCTGTC 


ACCAA6TGCT 


CCATAAGCAA 


GTTGGAACTC 


AGCGCAGCCG 


3600 


AACAGGAACT 


GCTTCTCACC 


CTGCCTTCCC 


TGGAATCTCT 


TGAAGTCTCA 


6GGACAATCC 


3660 


AGTCACAAGA 


CCAAATCTTT 


CCTAATCTGG 


ATAAGTTCCT 


GTGCCTGAAA 


GAACTGTCTG 


3720 


TGGATCTGGA 


GGGCAATATA 


AATGTTTTTT 


CAGTCATTCC 


TGAAGAATTT 


CCAAACTTCC 


3780 


ACCATATGGA 


GAAATTATTG 


ATCCAAATTT 


CAGCTGAGTA 


TGATCCTTCC 


AAACTAGTTG 


3840 


CCAGTTTGCC 


AAATTTTATT 


TCTCTGAAGA 


TATTAAATCT 


TGAAGGCGAG 


CAATTTCCTG 


3900 


ATGAGGAAAC 


ATCAGAAAAA 


TTTGCCTACA 


TTTTACkSTTC 


TCTTAGTAAC 


CTGGAAGAAt 


3960 


TGATCCTTCC 


7ACTGGGGAT 


GGAATTTATC 


GAGTGGCCAA 


ACTGATCATC 


CAGCAGTGTC 


4020 


AGCA6CTTCA 


TTGTCTCCGA 


GTCCTCTCAT 


TTTTCAAGAC 


TTTGAATGAT 


GACAGCGTGG 


4080 


TGGAAATTGG 


TTAAAAAT6T 


GTCTGCAGGC 


ACACAGGACG 


TGCCTTCACC 


CCCATCTGAC 


4140 


TATGTGGAAA 


GAGTTGACAG 


TCCCATGGCA 


TACTCTTCCA 


ATGGCAAAGT 


GAAT6ACAAG 


4200 


CGGTTTTATC 


CAGAGTCTTC 


CTATAAATCC 


ACGCCGGTTC 


CTGAAGTGGT 


TCAGGAGCTT 


4260 


CCATTAACTT 


CGCCTGTCGA 


TGACTTCAGG 


CAGCCTCGTT 


ACAGCAGCGG 


TGGTAACTTT 


4320 


GAGACACCTT 


CAAAAAGAGC 


ACCTGCAAAG 


GGAAGAGCAG 


GAAGGTCAAA 


GAGAACAGAG 


4380 


CAAGATCACT 


ATGAGACAGA 


CTACACAACT 


GGCGG06AGT 


CCTGTGATGA 


GCTGGAGGAG 


4440 


GACTGGATCA 


GGGAATATCC 


ACCTATCACT 


TCAGATCAAC 


AAAGACAACT 


GTACAAGAGG 


4500 


AATTTTGACA 








CAGAACTT6A 


TGAGATCAAT 


4560 


AAAGAACTCT 


CCCGTTTGGA 


TAAAGAATT6 


GATGACTATA 


GAGAAGAAAG 


TGAAGAGTAC 


4620 


ATGGCTGCTG 


CTGATGAATA 


CAATAGACTG 


AAGCAAGTGA 


AGGGATCTGC 


AGATTACAAA 


4680 


AGTAAGAAGA 


ATCATTGCAA 


GCAGTTAAAC 


AGCAAAtTGT 


CACACATCAA GAAGATGGTT 


4740 


GGAGACTAT6 


ATAGACAGAA 


AACATAGAAG 


GCTGATGCCA 


AGTTGTTTGA 


GAAATTAAGT 


4800 


ATCTGACATC 


TCTGCAATCT 


TCTCAGAAGG 


CAAATGACTT 


TGGACCATAA 


CCCCGGAAGC 


4860 


CAAACCTCTG 


TGAGCATCAC 


AGTTTTGGTT 


GCTTTAATAT 


CATCAGTATT 


GAAGCATTTT 


4920 
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CACTTTTTTC CACATJVAGGA AACTGGGTTC CTGCJJkTGAA GTCTCT6AAG TGAAACTGCT 5220 

TGTTTCCTAG CACACACTTT TGGTTAA6TC TGTTTTATGA CTTCATTAAT AATAAATTCC 5280 

GGCATCATAC AGCTACTCCT CCCTACC6CC ACCTCCACA6 ACACCACTCT CCTGGTTCCA 5340 

TCTCCTCT6C TGCTTCTAGC TCCCTGCTCT GGCTTCAAGG TGCGCAGGAC CTGCTTCCTT 5400 

GGTGATCCTC TGTAGTCTCC CACACCCCAC ATTATCTACA AACTGATGAC TCCTAATTTA 5460 

CATCTCCAGC TCAGACCTCT CCATCAATCC CAACGCATAC AC 5502 



SUBSTITUTE SHEET (RULE 26) 
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(2) 



INFORMATION FOR SBQ ID NO: 2: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1233 amino acids 

(B) TYPE: amino acid 
(O) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Ala Thr Gin Gin Lye Ala Ser Aep Glu Arg lie Ser Gin Phe Aep 
15 10 15 

His Asn Leu Leu Pro Glu Leu Ser Ala Leu Leu Gly Leu Asp Ala Val 
20 25 30 

Gin Leu Ala Lys Glu Leu Glu Glu Glu Glu Gin Lys Glu Arg Ala Lys 
35 40 45 

Met Gin Lys Gly Tyr Asn Ser Gin Met Arg Ser Glu Ala Lys Arg Leu 
50 55 60 

Lys Thr Phe Val Thr Tyr Glu Pro Tyr Ser Ser Trp lie Pro Gin Glu 
65 70 75 80 

Met Ala Ala Ala Gly Phe Tyr Phe Thr Gly Val Lys Ser Gly He Gin 
85 90 95 

Cys Phe Cys Cys Ser Leu He Leu Phe Gly Ala Gly Leu Thr Arg Leu 
100 105 110 

Pro He Glu Asp His Lys Arg Phe His Pro Asp Cys Gly Phe Leu Leu 

115 120 125 

Asn Lys Asp Val Gly Asn He Ala Lys Tyr Asp He Arg Val Lys Asn 
130 135 140 

Leu Lys Ser Arg Leu Arg Gly Gly Lys Met Arg Tyr Gin Glu Glu Glu 
145 150 155 160 

Ala Arg Leu Ala Ser Phe Arg Asn Trp Pro Phe Tyr Val Gin Gly He 
165 170 175 

Ser Pro Cys Val Leu Ser Glu Ala Gly Phe Val Phe Thr Gly Lye Gin 
180 185 - 190 

Asp Thr Val Gin Cys Phe Ser Cys Gly Gly Cys Leu Gly Asn Trp Glu 
195 200 205 

Glu Gly Asp Asp Pro Trp Lys Glu His Ala Lys Trp Phe Pro Lys Cys 
210 215 220 

Glu Phe Leu Arg Ser Lys Lys Ser Ser Glu Glu He Thr Gin Tyr He 
225 230 235 240 

Gin Ser Tyr Lye Gly Phe Val Asp He Thr Gly Glu His Phe Val Asn 
245 250 255 

Ser Trp Val Gin Arg Glu Leu Pro Met Ala Ser Ala Tyr Cys Asn Asp 
260 265 270 

Ser He Phe Ala Tyr Glu Glu Leu Arg Leu Asp Ser Phe Lys Asp Trp 
275 280 285 

Pro Arg Glu Ser Ala Val Gly Val Ala Ala Leu Ala Lys Ala Gly Leu 



290 



295 



300 



RECTIFIED SHEET (RULE 91) 
ISA/EP 
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Phe Tyr Thr Gly He Lya Aap He Val Gin Cys Phe Ser Cys Gly Gly 
305 310 315 320 

Cye Leu Glu Lys Trp Gin Glu Gly Aap Asp Pro Leu Asp Aep Hie Thr 
325 330 335 

Arg Cys Phe Pro Aan Cys Pro Phe Leu Gin Asn Met Lys Ser Ser Ala 
340 345 350 

Glu Val Thr Pro Asp Leu Gin Ser Arg Gly Glu Leu Cys Glu Leu Leu 
355 360 365 

Glu Thr Thr Ser Glu ser Asn Leu Glu Asp Ser He Ala Val Gly Pro 
370 375 380 

He Val Pro Glu Net Ala Gin Gly Glu Ala Gin Trp Phe Gin Glu Ala 
385 390 395 400 

Lys Asn Leu Asn Glu Gin Leu Arg Ala Ala Tyr Thr Ser Ala Ser Phe 
405 410 415 

Arg His Met Ser Leu Leu Asp He Ser Ser Asp Leu Ala Thr Asp His 
420 425 430 

Leu Leu Gly Cys Asp Leu Ser He Ala Ser Lys His He Ser Lys Pro 
435 440 445 

Val Gin Glu Pro Leu Val Leu Pro Glu Val Phe Gly Asn Leu Asn Ser 
450 45S 460 

Val Met Cys Val Glu Gly Glu Ala Gly Ser Gly Lys Thr Val Leu Leu 
465 470 475 480 

Lys Lys He Ala Phe Leu Trp Ala Ser Gly Cys Cys Pro Leu Leu Asn 
485 490 495 

Arg Phe Gin Leu Val Phe Tyr Leu Ser Leu Ser Ser Thr Arg Pro Asp 
500 505 510 

Glu Gly Leu Ala Ser He He Cys Asp Gin Leu Leu Glu Lys Glu Gly 
515 520 525 

Ser Val Thr Glu Met Cys Met Arg Asn He He Gin Gin Leu Lye Asn 
530 535 ..^ 540 

Gin Val Leu Phe Leu Leu Asp Asp Tyr Lys Glu He Cys Ser He Pro 
545 550 555 560 

Gin Val He Gly Lys Leu He Gin Lys Asn His Leu Ser Arg Thr Cys 
565 570 575 

Leu Leu He Ala Val Arg Thr Asn Arg Ala Arg Asp He Arg Arg Tyr 

580 585 590 

Leu Glu .Thr He. Leu Glu He Gin Ala Phe Pro Phe Tyr Asn TJar Val 
595 600 605 * 

Cys He Leu Arg Lys Leu Phe Ser His Aen Met Thr Arg Leu Arg Lys 
610 615 620 

Phe Met Val Tyr Phe Gly Lys Asn Gin Ser Leu Gin Lys He Gin Lya 
625 630 635 640 

Thr Pro Leu Phe Val Ala Ala He Cys Ala His Trp Phe Gin Tyr Pro 
645 650 655 
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Phe Asp Pro Ser Phe Asp Aep VaX Ala Val Phe Lye Ser Tyr Met GIu 

660 665 670 

Arg Leu Ser Leu Arg Asn Lys Ala Thr Ala Glu lie Leu Lys Ala Thr 
675 680 665 

Val Ser Ser Cys Gly Glu Leu Ala Leu Lys Gly Phe Phe Ser Cya Cys 
690 695 700 

Phe Glu Phe Asn Asp Asp Asp Leu Ala Glu Ala Gly Val Asp Glu Asp 
705 710 715 720 

Glu Asp Leu Thr Met Cys Leu Met Ser Lys Phe Thr Ala Gin Arg Leu 

725 730 735 

Arg Pro Phe Tyr Arg Phe Leu Ser Pro Ala Phe Gin Glu Phe Leu Ala 
740 745 750 

Gly Met Arg Leu He Glu Leu Leu Asp Ser Asp Arg Gin Glu His Gin 
755 760 765 

Asp Leu Gly Leu Tyr His Leu Lys Gin He Asn Ser Pro Met Met Thr 
770 775 780 

Val Ser Ala Tyr Asn Asn Phe Leu Asn Tyr Val Ser Ser Leu Pro Ser 

785 790 795 BOO 

Thr Lye Ala Gly Pro Lys He Val Ser His Leu Leu His Leu Val Asp 
805 810 815 

Asn Lys Glu Ser Leu Glu Asn He Ser Glu Asn Asp Asp Tyr Leu Lys 

820 825 830 

His Gin Pro Glu He Ser Leu Gin Met Gin Leu Leu Arg Gly Leu Trp 
835 840 845 

Gin He Cys Pro Gin Ala Tyr Phe Ser Met Val Ser Glu His Leu Leu 
850 855 860 

Val Leu Ala Leu Lys Thr Ala Tyr Gin Ser Asn Thr Val Ala Ala Cys 
865 870 875 880 

Ser Pro Phe Val Leu Gin Phe Leu Gin Gly Arg Thr Leu Thr Leu Gly 
885 890 895 

Ala Leu Asn Leu Gin Tyr Phe Phe Asp His Pro Glu Ser Leu Ser Leu 
900 905 910 

Leu Arg Ser He His Phe Ser He Arg Gly Asn Lys Thr Ser Pro Arg 
915 920 925 

Ala His Phe Ser Val Leu Glu Thr Cys Phe Asp Lys Ser Gin Val Pro 
930 935 940 

Thr He Asp Gin Asp Tyr Ala Ser Ala Phe Glu Pro Met Asn Glu Trp 

945 ?50„.^ 955 960 

Glu Arg Asn Leu Ala Glu Lys Glu Asp Asn Val Lys Ser Tyr Met Asp 
965 970 975 

Net Gin Arg Arg Ala Ser Pro Asp Leu Ser Thr Gly Tyr Trp Lys Leu 
980 985 990 

Ser Pro Lys Gin Tyr Lys He Pro Cys Leu Glu Val Asp Val Asn Asp 
995 1000 1005 
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He A8P Val VaX 6ly Gin Aep Met Leu Glu He Leu Met Thr Val Phe 
1010 1015 1020 

Set Ala Ser Gin Arg He Glu Leu His Leu Asn His ser Arg Gly Phe 
1025 1030 1035 1040 

He Glu Ser He Arg Pro Ala Leu Glu Leu Ser Lye Ala Ser Val Thr 
1045 1050 1055 

Lye Cys Ser He Ser Lys Leu Glu Leu Ser Ala Ala Glu Gin Glu Leu 
1060 1065 1070 

Leu Leu Thr Leu Pro Ser Leu Glu Ser Leu Glu Val Ser Gly Thr He 
1075 1080 1085 

Gin Ser Gin Aap Gin He Phe Pro Asn Leu Asp Lys Phe Leu Cys Leu 
1090 1095 1100 

Lys Glu Leu Ser Val Asp Leu Glu Gly Asn He Asn Val Phe Ser Val 
1105 1110 1115 1120 

He Pro Glu Glu Phe Pro Asn Phe His His Met Glu Lys Leu Leu He 
1125 1130 1135 

Gin He Ser Ala Glu Tyr Asp Pro Ser Lys Leu Val Ala Ser Leu Pro 
1140 1145 1150 

Asn Phe He Ser Leu Lys He Leu Asn Leu Glu Gly Gin Gin Phe Pro 
1155 1160 1165 

Asp Glu Glu Thr Ser Glu Lys Phe Ala Tyr He Leu Gly Ser Leu Ser 
1170 1175 1180 

Asn Leu Glu Glu Leu He Leu Pro Thr Gly Asp Gly He Tyr Arg Val 
1185 1190 1195 1200 

Ala Lys Leu He He Gin Gin Cys Gin Gin Leu His Cys Leu Arg Val 
1205 1210 1215 

Leu Ser Phe Phe Lys Thr Leu Asn Asp Asp Ser Val Val Glu He Gly 
1220 1225 1230 



RECTIFIED SHEET (RULE 91) 
ISA/EP 
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(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base paire 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
ATGCTTGGAT CTCTAGAAT6 C 21 
<2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
AGCAAAGACA TGT6GCGGAA 20 



(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 baee pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
CCA6CTCCTA GAGAAAGAA6 6A 22 
(2) INFORMATION FOR SEQ ID NOt 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
(0) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
GAACTAC06C TGGACTCTTT T 21 
(2) INFORMATION FOR SEQ ID NO: 7: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANOEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: 5EQ ID NO: 7: 
CTCTCA6CCT GCTCTTCAOA T 21 
(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

AAA6CCTCTG A06AGAGGAT C 21 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<0) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
CGACTGCCTG TTCATCTACG A 21 
(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
TTTGTTCTCC AGCCACATAC T 21 
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(2) INFORMATION FOR SEQ ID NO: 11: 

<i} SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base paire 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
CATTTGGCAT GTTCCTTCCA AG 22 
(2) INFORMATION FOR SEQ ID NO: 12: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
GTAGATGAAT ACTGATGTTT CATAATT 27 
(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
TGCCACTGCC AGGCAATCTA A 21 
(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
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TAAACAGGAC A0GGTACA6T G 21 
(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
CATGTTTTAA CTCTCGGTGC TCTG 24 
(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
TTA6CCAGAT GTGTTGGCAC ATG 23 
(2) INFORMATION FOR SEQ ID NO: 17 t 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
GATTCTATGT GATAGGCAGC OA 22 
(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) -LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
GCCACTGCTC CC6ATGGATT A 21 
(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

GCTCTCAGCT GCTCATTCAG AT 22 

(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
ACAAAGTTCA CCACGGCTCT G 21 
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THE EMBODIMENTS OF THE INVENTION IN WHICH AN EXCLUSIVE 
PROPERTY OR PRIVILEGE IS CLAIMED ARE DEFINED AS FOLLOWS: 



1. A human gene which naps to the SMA containing region 
5 of chromosome SglB, said gene comprising exons l through 

17 of approximately 5.5 kb and having a restriction map 
for exons 2 through 11 as shovm in Figure 8* 

2. A human gene which maps to the SMA containing region 
10 of chromosome 5ql3, said gene comprising exons 1 through 

17 of approximately 5.5 kb and having a restriction map 
for exons 2 through 16 as shown in Figure 9D. 



3. A human gene of claim 1 or 2 wherein exons 5 through 
15 16 code for the NAIP protein having an amino acid 

sequence biologically functionally equivalent to the 
amino acid sequence of Sequence ID No. 2* 

4 . A human gene of claim 1 or 2 wherein exons 5 through 
20 16 have a cDNA sequence biologically functionally 

equivalent to the cDNA sequence of Sequence ID No. l. 

5. A purified nucleotide sequence comprising genomic 
DNA; cDNA, mRNA; anti-sense DNA or homologous DNA 

25 corresponding to the cDNA sequence of Sequence ID No. l. 

6. A DNA molecule comprising a DNA sequence of Sequence 
ID NO. 1. 

30 7. A DNA molecule comprising a DNA sequence coding for 
the NAIP protein having Sequence ID No. 2. 

8. A purified DNA sequence consisting essentially of 
DNA Sequence ID No. 1. 

35 

9. A purified DNA sequence consisting essentially of a 
DNA sequence coding for amino acid Sequence ID No. 2. 
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10. A purified DNA sequence comprising at least 18 
sequential bases of Sequence ID No. 1, 

11. A ONA probe comprising a DNA sequence of claim 10. 

5 

12. A PGR primer comprising a DNA sequence of claim 10. 

13. A DNA hybridization molecule comprising a DNA 
sequence of claim 10. 

10 

14. A purified DNA sequence of claims 10, 11, 12 or 13 
wherein said DNA sequence is selected from exons 1 
through 4 and 17 of Table 4. 

15 15. A purified DNA sequence of claims 10, 11, 12 or 13 
wherein said DNA sequence is selected from exons 5 
through 16 of Table 4. 

16- A purified DNA sequence of claim 10, 11, 12 or 13 
20 selected from the group of DNA sequences consisting of 

exon sequences 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 
14, 15, 16 and 17 of Table 4. 

17. A purified DNA sequence of claim 16 wherein the DNA 
25 sequence is selected from exons 4, 5, 6 and 7. 

18. A purified DNA sequence of claim 16 wherein the DNA 
sequence is selected for exons 5 and 6. 

30 19. Use of DNA sequences of claims 1, 2, 3, 4, 5, 6, 7, 
8, 9, or 10 in the construction of a cloning vector or an 
expression vector. 

20. NAIP protein encoded by a DNA sequence of claims 1, 
35 2, 2, 4, 5, 6, 7, 8, or 9. 
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21. A 15 amino acid fragment of NAIP protein encoded by 
45 sequential bases of the DNA sequence of claim 10. 

22. NAIP protein comprising an amino acid secpaence 

5 biologically equivalent to the amino acid sequence of 
Sequence ID No. 2. 

23. NAIP protein consisting essentially of the amino 
acid seG[uence.of Sequence ID No. 2. 

10 

24. NAIP protein fragment comprising at least 15 
sequential amino acids of Sequence ID No. 2. 

25. NAIP protein fragment comprising an amino acid 

15 sequence selected from the group of polypeptides encoded 
by one of exons 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 and 
16. 

26. NAIP protein fragment of claim 25 wherein selected 
20 polypeptides are those encoded by exons 5, 6, 7, 8, 9, 

10, 11 or 12. 

27. NAIP protein fragment of claim 25 wherein selected 
polypeptides are those encoded by exons 5 and 6. 

25 

28. Use of amino acid sequences of claims 20, 21, 22, 
23, 24, 25 or 26 in production of hybridomas. 

29. A method for analyzing a biological sample to 
30 diagnose presence or absence of a gene encoding NAIP 

protein, said method comprising: 

i) providing a biological sample derived from the 
SMA containing region ql3 of human chromosome 5; 

ii) conducting a biological assay to determine 
35 presence or absence in said biological sample of at least 
a member selected from the group consisting of: 
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a) 
b) 



NAIP DMA Sequence ID No. 1, and 
NAIP protein Sequence ID No. 2. 



30. A method of claim 29 for diagnosing a human's risk 
5 of developing SMA wherein step ii) mutations in said 

sequence of group a) or b) are assayed. 

31. A method of claim 30 wherein presence or absence of 
exons 5 and 6 of group a) are assayed. 

10 

32. A method of claim 31 further comprising determining 
intact gene copy number in chromosome 5 of a gene 
encoding said NAIP protein. 

15 33. A method of claim 29, 30, 31 or 32 wherein said 
biological assay comprises an assay selected from the 
group consisting of DNA hybridization, restriction enzyme 
analysis, PGR amplification, mRNA detection and DNA 
sequencing. 

20 

34. A method of claim 29, 30, 31 or 32 wherein said 
biological assay comprises PGR amplification of exon 
regions 5 and 6 by use of PGR primers selected from the 
5' region of exon 5 and 3* region of exon 6. 
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